diff --git a/examples/milvus-text-search-engine/README.md b/examples/milvus-text-search-engine/README.md new file mode 100644 index 0000000000..0429829d93 --- /dev/null +++ b/examples/milvus-text-search-engine/README.md @@ -0,0 +1,311 @@ +# DeepSparse + Milvus: Text-Search with BERT + +This example demonstrates how to create a semantic search engine using FastAPI, DeepSparse, Milvus, and MySQL. + +We will create 4 services: +- Milvus Server - vector database used to hold the embeddings of the article dataset and perform the search queries +- MySQL Server - holds the mapping from Milvus ids to original article data +- DeepSparse Server - inference runtime used to generate the embeddings for the queries +- Application Server - endpoint called by the client side with queries for searching + +We will demonstrate running on a local machine as well as in a VPC on AWS with independent-scaling of the App, Database, and Model Serving Components. + +## Application Architecture + +We have provided a sample dataset in `client/example.csv`. These data are articles about various topics, in `(title,text)` pairs. We will create an application that will allow users to upload arbitrary `text` and find the 10 most similiar articles using semantic search. + +The app server is built on FastAPI and exposes a both `/load` and `/search` endpoints. + +The `/load` endpoint accepts a csv file with `(title, text)` representing a series of articles. On `/load`, we project the `text` into the embedding space with BERT running on DeepSparse. We then store each embedding in Milvus with a primary key `id` and store the `(id,title,text)` tripes in MySQL. + +The `/search` endpoint enables clients to send `text` to the server. The app server sends the `text` to DeepSparse Server, which returns the embedding of the query. This embedding is sent to Milvus, which searches for the 10 most similiar vectors in the database and returns their `ids` to the app server. The app server then looks up the `(title,text)` in MySQL and returns them back to the client. + +As such, we can scale the app server, databases, and model service independently! + +## Running Locally + +### Start the Server + +#### Installation: +- Milvus and Postgres are installed using Docker containers. [Install Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/linux/). +- DeepSparse is installed via PyPI. Create a virtual enviornment and run `pip install -r server/deepsparse-requirements.txt`. +- The App Server is based on FastAPI. Create a virtual enviornment and run `pip install -r server/app-requirements.txt`. + +#### 1. Start Milvus + +Milvus has a convient `docker-compose` file which can be downloaded with `wget` that launches the necessary services needed for Milvus. + +``` bash +cd server/database-server +wget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml +sudo docker-compose up +cd .. + +``` +This command should create `milvus-etcd`, `milvus-minio`, and `milvus-standalone`. + +#### 2. Start MySQL + +MySQL can be started with the base MySQL image available on Docker Hub. Simply run the following command. + +```bash +docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7 +``` + +#### 3. Start DeepSparse Server + +DeepSparse not only includes high performance runtime on CPUs, but also comes with tooling that simplify the process of adding inference to an application. Once example of this is the Server functionality, which makes it trivial to stand up a model service using DeepSparse. + +We have provided a configuration file in `/server/deepsparse-server/server-config-deepsparse.yaml`, which sets up an embedding extraction endpoint running a sparse version of BERT from SparseZoo. You can edit this file to adjust the number of workers you want (this is the number of concurrent inferences that can occur). Generally, its a fine starting point to use `num_cores/2`. + +Here's what the config file looks like. + +```yaml +num_workers: 4 # number of streams - should be tuned, num_cores / 2 is good place to start + +endpoints: + - task: embedding_extraction + model: zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni + route: /predict + name: embedding_extraction_pipeline + kwargs: + return_numpy: False + extraction_strategy: reduce_mean + sequence_length: 512 + engine_type: deepsparse +``` + +To start DeepSparse, run the following: + +```bash +deepsparse.server --config_file server/deepsparse-server/server-config-deepsparse.yaml +``` + +TO BE REMOVED --- hack to remove bug in Server + +- Run `vim deepsparse-env/lib/python3.8/site-packages/deepsparse/server/server.py` +- In `_add_pipeline_endpoint()`, udpate `app.add_api_route` by commenting out `response_model=output_schema`. + +ESC-I enters insert mode; ESC Exits insert mode. :wq writes file and quits. + +**Potential Improvements** + +There is both a throughput-focused step (`load`) where we need to process a large number of embeddings at once with no latency requirements and there is a latency-focused step (`search`) where we need to process one embedding and return to the user as fast as possible. For simplicity, we currently only use one configuration of DeepSparse with `batch_size=1`, which is a latency-oriented setup. + +An extension to this project would be configuring DeepSparse to have multiple endpoints or adding another DeepSparse Server instance with a configuration for high throughput. + +#### 4. Start The App Server + +The App Server is built on `FastAPI` and `uvicorn` and orchestrates DeepSparse, Milvus, and MySQL to create a search engine. + +Run the following to launch. + +```bash +python3 server/app-server/src/app.py +``` + +### Use the Search Engine! + +We have provided both a Jupyter notebook and latency testing script to interact with the server. + +#### Jupyter Notebook +The Jupyter notebook is self-documenting and is a good starting point to play around with the application. + +You can run with the following command: +`juptyer notebook example-client.ipynb` + +#### Latency Testing Script +The latency testing script generates multiple clients to test response time from the server. It provides metrics on both overall query latency as well as metrics on the model serving query latency (the end to end time from the app server querying DeepSparse until a response is returned.) + +You can run with the following command: +```bash +python3 client/latency-test-client.py --url http://localhost:5000/ --dataset_path client/example.csv --num_clients 8 +``` +- `--url` is the location of the app server +- `--dataset_path` is the location of the dataset path on client side +- `--num_clients` is the number of clients that will be created to send requests concurrently + +## Running in an AWS VPC with Independent-Scaling + +### Create a VPC + +First, we will create a VPC that houses our instances and enables us to communicate between the App Server, Milvus, MySQL, and DeepSparse. + +- Navigate to `Create VPC` in the AWS console +- Select `VPC and more`. Name it `semantic-search-demo-vpc` +- Make sure you have `IPv4 CIDR block` set. We use `10.0.0.0/16` in the example. +- Number of AZs to 1, Number of Public Subnets to 1, and Number of Private Subnets to 0. + +When we create our services, we will add them to the VPC and only enable communication to the backend model service and databases from within the VPC, isloating the model and database services from the internet. + +### Create a Database Instance + +Launch an EC2 Instance. +- Navigate to EC2 > Instances > Launch an Instance +- Name the instance `database-server` +- Select Amazon Linux + +Edit the `Network Setting`. +- Put the `app-server` into the `semantic-search-demo-vpc` VPC +- Choose the public subnet +- Set `Auto-Assign Public IP` to `Enabled`. +- Add a `Custom TCP` security group rule with port `19530` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to Milvus +- Add a `Custom TCP` security group rule with port `3306` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to MySQL + +Launch the instance and then SSH into your newly created instance and start-up the app server. +``` +ssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip +``` +Install Docker/Docker Compose and add group membership for the default ec2-user: +``` +sudo yum update -y +sudo yum install docker -y +sudo usermod -a -G docker ec2-user +id ec2-user +newgrp docker +pip3 install --user docker-compose +``` + +Start Docker and Check it is running with the following: +``` +sudo service docker start +docker container ls +``` + +Download Milvus Docker Image and Launch Milvus with `docker-compose`: +``` +wget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml +docker-compose up +``` + +SSH from another terminal into the same instance to setup MySQL. +``` +ssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip +``` + +Run the following to launch MySQL: +```bash +docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7 +``` + +Your databases are up and running! + +### Create the Application Server + +Launch an EC2 Instance. +- Navigate to EC2 > Instances > Launch an Instance +- Name the instance `app-server` +- Select Amazon Linux + +Edit the `Network Setting` to expose the App Endpoint to the Internet while still giving access to the backend database and model service. +- Put the `app-server` into the `semantic-search-demo-vpc` VPC +- Choose the public subnet +- Set `Auto-Assign Public IP` to `Enabled`. +- Add a `Custom TCP` security group rule with port `5000` with `source-type` of `Anywhere`. This exposes the app to the internet. + +Click Launch Instance and SSH into your newly created instance and launch the app server. + +From the command line run: +``` +ssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip +``` + +Clone this repo with Git: +```bash +sudo yum update -y +sudo yum install git -y +sudo git clone https://github.com/rsnm2/deepsparse-milvus.git +``` + +Install App Requirements in a virutal enviornment. +```bash +python3 -m venv app-env +source app-env/bin/activate +pip3 install -r deepsparse-milvus/text-search-engine/server/app-requirements.txt +``` + +Run the following to activate. +```bash +python3 deepsparse-milvus/text-search-engine/server/app-server/src/app.py --database host private.ip.of.database.server --model_host private.ip.of.model.server +``` + +Your App Server is up and Running! + +### Create DeepSparse AWS Instance + +Launch an EC2 Instance. +- Navigate to EC2 > Instances > Launch an Instance +- Name the instance `database-server` +- Select Amazon Linux and a `c6i.4xlarge` instance type + +Edit the `Network Setting` to expose the App Endpoint to the Internet while still giving access to the backend database and model service. +- Put the `app-server` into the `semantic-search-demo-vpc` VPC +- Choose the public subnet +- Set `Auto-Assign Public IP` to `Enabled`. +- Add a `Custom TCP` security group rule with port `5543` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to DeepSparse + +Click Launch Instance and SSH into your newly created instance and launch the DeepSparse Server. +``` +ssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip +``` + +Clone this repo with Git: +```bash +sudo yum update -y +sudo yum install git -y +git clone https://github.com/rsnm2/deepsparse-milvus.git +``` + +Install App Requirements in a virutal enviornment. +```bash +python3 -m venv deepsparse-env +source deepsparse-env/bin/activate +pip3 install -r deepsparse-milvus/text-search-engine/server/deepsparse-requirements.txt +``` + +TO BE REMOVED --- hack to remove bug in Server + +- Run `vim deepsparse-env/lib/python3.7/site-packages/deepsparse/server/server.py` +- In `_add_pipeline_endpoint()`, udpate `app.add_api_route` by commenting out `response_model=output_schema`. + + +Run the following to start a model server with DeepSparse as the runtime engine. +```bash +deepsparse.server --config-file deepsparse-milvus/text-search-engine/server/deepsparse-server/server-config-onnxruntime.yaml``` +``` + +You should see a Uvicorn server running! + +We have also provided a config file with ONNX as the runtime engine for performance comparison. +You can launch a server with ONNX Runtime with the following: +```bash +deepsparse.server --config-file deepsparse-milvus/text-search-engine/server/deepsparse-server/server-config-onnx.yaml +``` +**Note: you should have either DeepSparse or ONNXRuntime running but not both*** + +### Benchmark Performance + +From your local machine, run the following, which creates 4 clients that continously make requests to the server. + +```bash +python3 client/latency-test-client.py --url http://app-server-public-ip:5000/ --dataset_path client/example.csv --num_clients 4 --iters_per_client 25 +``` + +With DeepSparse running in the Model Server, the latency looks like this, where Model Latency is the time it takes to process +a request by Model Server and Query Latency is the full end to end time on the client side (Network Latency + Model Latency + Database Latency). + +``` +Model Latency Stats: +{'count': 100, + 'mean': 97.6392858400186, + 'median': 97.46583750006721, + 'std': 0.7766356131548698} + +Query Latency Stats: +{'count': 100, + 'mean': 425.1315195999632, + 'median': 425.0526745017851, + 'std': 34.73163016766087} +``` + +**RS Note: when scaling this out with more clients, the rest of the system becomes the bottleneck for scaling. So, need to investigate a bit more how to show off the performance of DeepSparse** diff --git a/examples/milvus-text-search-engine/client/example-client.ipynb b/examples/milvus-text-search-engine/client/example-client.ipynb new file mode 100644 index 0000000000..dc9ae37f4e --- /dev/null +++ b/examples/milvus-text-search-engine/client/example-client.ipynb @@ -0,0 +1,260 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "72ce5274", + "metadata": {}, + "source": [ + "# **DeepSparse-Milvus Semantic Search Client**\n", + "\n", + "This notebook demonstrates how to interact with the Semantic Search application over HTTP." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "d7251e09", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "\n", + "# if running on localhost\n", + "# base_url = \"localhost\"\n", + "\n", + "# if running on AWS - add your app servers' public IP\n", + "base_url = \"34.227.11.121\"" + ] + }, + { + "cell_type": "markdown", + "id": "dcc16555", + "metadata": {}, + "source": [ + "### **Drop Existing Data**\n", + "\n", + "`/drop` route used to drop existing colleciton if it exists." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "026f4da3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\"Collection does not exist\"\n" + ] + } + ], + "source": [ + "# /drop path drops the collection\n", + "url = f'http://{base_url}:5000/drop'\n", + "\n", + "response = requests.post(url)\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "id": "392f639e", + "metadata": {}, + "source": [ + "### **Load New Data**\n", + "\n", + "`/load` route used to generate embeddings with DeepSparse and load data into Milvus.\n", + "\n", + "The `/load` path accepts a `csv` file with `(title, text)` pairs. When called, this endpoint generates `embeddings` with DeepSparse on each `text` and inserts the `(id, embedding)` in Milvus and the `(id, title, text)`. Since we are generating embeddings for every element in the list, this may take a bit of time.\n", + "\n", + "There is an `example.csv` file available." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "f5c8f836", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\"Successfully loaded data. Inference Time 4.723909034999906; DB Load Time 3.0950468900000487\"\n" + ] + } + ], + "source": [ + "url = f'http://{base_url}:5000/load'\n", + "\n", + "response = requests.post(url, files={\n", + " 'file': open('example.csv', 'rb')\n", + "})\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "id": "78389ce0", + "metadata": {}, + "source": [ + "### **Check It Worked**\n", + "\n", + "`/count` route allows you to check the number of elements in the database. You should see 160 if you used the `example.csv` file." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a0e23382", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "160\n" + ] + } + ], + "source": [ + "url = f'http://{base_url}:5000/count'\n", + "\n", + "response = requests.post(url)\n", + "print(response.text)" + ] + }, + { + "cell_type": "markdown", + "id": "578e6e52", + "metadata": {}, + "source": [ + "### **Search The Database**\n", + "\n", + "`/search` endpoint accepts a sentence over `GET`. Let's try 50 uploads to see what the responses and latency looks like." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "94d6534f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The United States has brokered a cease-fire between a renegade Afghan militia leader and the embattled governor of the western province of Herat, Washington's envoy to Kabul said Tuesday.\n" + ] + } + ], + "source": [ + "sentence = \"The United States has brokered a cease-fire between a renegade Afghan militia leader and the embattled governor of the western province of Herat, Washington's envoy to Kabul said Tuesday.\"\n", + "print(sentence)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "343ee35c", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "for _ in range(50):\n", + " query_str = \"query_sentence=\" + sentence\n", + " url = f'http://{base_url}:5000/search'\n", + " response = json.loads(requests.get(url, query_str).text)" + ] + }, + { + "cell_type": "markdown", + "id": "8de9d77d", + "metadata": {}, + "source": [ + "We can see that the responses include articles that are similiar to the baseline text." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "720c7177", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "U.S. Brokers Cease-fire in Western Afghanistan\n", + "Afghan Army Dispatched to Calm Violence\n", + "Delegates Urge Cleric to Pull Out of Najaf\n", + "Delegation Is Delayed Before Reaching Najaf\n", + "Karzai Promises Afghans Security for Election (Reuters)\n", + "Georgian president calls for international conference on South Ossetia\n", + "Fresh Fighting Shatters Short-Lived Ceasefire Deal\n", + "Iran Warns Its Missiles Can Hit Anywhere in Israel\n", + "Peace delegation leaves Najaf empty-handed as fighting continues\n" + ] + } + ], + "source": [ + "for idx in response:\n", + " print(response[idx]['title'])" + ] + }, + { + "cell_type": "markdown", + "id": "18098ac2", + "metadata": {}, + "source": [ + "### **Check Latency**\n", + "\n", + "`/latency` endpoint checks the latency of recent calls. Calling the endpoint also clears old latency data." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "4a9ad6e9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'count': 50, 'median': 99.72352400006912, 'mean': 99.62290793999728, 'std': 0.7136873781060545}\n" + ] + } + ], + "source": [ + "url = f'http://{base_url}:5000/latency'\n", + "\n", + "response = requests.post(url)\n", + "print(json.loads(response.text))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/milvus-text-search-engine/client/example.csv b/examples/milvus-text-search-engine/client/example.csv new file mode 100644 index 0000000000..16804cda9d --- /dev/null +++ b/examples/milvus-text-search-engine/client/example.csv @@ -0,0 +1,161 @@ +title,text +Fears for T N pension after talks,Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul. +The Race is On: Second Private Team Sets Launch Date for Human Spaceflight ,"SPACE.com - TORONTO, Canada -- A second team of rocketeers competing for the 36;10 million Ansari X Prize, a contest for privately funded suborbital space flight, has officially announced the first launch date for its manned rocket." +Ky. Company Wins Grant to Study Peptides ," A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins." +Prediction Unit Helps Forecast Wildfires ," It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar." +Calif. Aims to Limit Farm-Related Smog ," Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure." +Open Letter Against British Copyright Indoctrination in Schools,"The British Department for Education and Skills (DfES) recently launched a Music Manifesto campaign, with the ostensible intention of educating the next generation of British musicians. Unfortunately, they also teamed up with the music industry (EMI, and various artists) to make this popular. EMI has apparently negotiated their end well, so that children in our schools will now be indoctrinated about the illegality of downloading music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area." +Loosing the War on Terrorism," Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is responsible for 70 percent of virus infections in 2004, according to a six-month virus roundup published Wednesday by antivirus company Sophos. The 18-year-old Jaschan was taken into custody in Germany in May by police who said he had admitted programming both the Netsky and Sasser worms, something experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the teenager's arrest.) During the five months preceding Jaschan's capture, there were at least 25 variants of Netsky and one of the port-scanning network worm Sasser. Graham Cluley, senior technology consultant at Sophos, said it was staggeri " +"FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters", FOAF/LOAF and bloom filters have a lot of interesting properties for social network and whitelist distribution. I think we can go one level higher though and include GPG/OpenPGP key fingerpring distribution in the FOAF file for simple web-of-trust based key distribution. What if we used FOAF and included the PGP key fingerprint(s) for identities? This could mean a lot. You include the PGP key fingerprints within the FOAF file of your direct friends and then include a bloom filter of the PGP key fingerprints of your entire whitelist (the source FOAF file would of course need to be encrypted ). Your whitelist would be populated from the social network as your client discovered new identit +E-mail scam targets police chief,Wiltshire Police warns about phishing after its fraud squad chief was targeted. +"Card fraud unit nets 36,000 cards","In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m." +Group to Propose New High-Speed Wireless Format," LOS ANGELES (Reuters) - A group of technology companies including Texas Instruments Inc. TXN.N>, STMicroelectronics STM.PA> and Broadcom Corp. BRCM , on Thursday said they will propose a new wireless networking standard up to 10 times the speed of the current generation." +"Apple Launches Graphics Software, Video Bundle", LOS ANGELES (Reuters) - Apple Computer Inc. AAPL on Tuesday began shipping a new program designed to let users create real-time motion graphics and unveiled a discount video-editing software bundle featuring its flagship Final Cut Pro software. +Dutch Retailer Beats Apple to Local Download Market," AMSTERDAM (Reuters) - Free Record Shop, a Dutch music retail chain, beat Apple Computer Inc. to market on Tuesday with the launch of a new download service in Europe's latest battleground for digital song services." +Super ant colony hits Australia,"A giant 100km colony of ants which has been discovered in Melbourne, Australia, could threaten local insect species." +Socialites unite dolphin groups,"Dolphin groups, or pods , rely on socialites to keep them from collapsing, scientists claim." +Teenage T. rex's monster growth,Tyrannosaurus rex achieved its massive size due to an enormous growth spurt during its adolescent years. +Scientists Discover Ganymede has a Lumpy Interior,"Jet Propulsion Lab -- Scientists have discovered irregular lumps beneath the icy surface of Jupiter's largest moon, Ganymede. These irregular masses may be rock formations, supported by Ganymede's icy shell for billions of years " +Mars Rovers Relay Images Through Mars Express,"European Space Agency -- ESAs Mars Express has relayed pictures from one of NASA's Mars rovers for the first time, as part of a set of interplanetary networking demonstrations. The demonstrations pave the way for future Mars missions to draw on joint interplanetary networking capabilities " +Rocking the Cradle of Life,"When did life begin? One evidential clue stems from the fossil records in Western Australia, although whether these layered sediments are biological or chemical has spawned a spirited debate. Oxford researcher, Nicola McLoughlin, describes some of the issues in contention." +"Storage, servers bruise HP earnings","update Earnings per share rise compared with a year ago, but company misses analysts' expectations by a long shot." +IBM to hire even more new workers,"By the end of the year, the computing giant plans to have its biggest headcount since 1991." +Sun's Looking Glass Provides 3D View,Developers get early code for new operating system 'skin' still being crafted. +IBM Chips May Someday Heal Themselves,New technology applies electrical fuses to help identify and repair faults. +Some People Not Eligible to Get in on Google IPO,"Google has billed its IPO as a way for everyday people to get in on the process, denying Wall Street the usual stranglehold it's had on IPOs. Public bidding, a minimum of just five shares, an open process with 28 underwriters - all this pointed to a new level of public participation. But this isn't the case." +Rivals Try to Turn Tables on Charles Schwab,"By MICHAEL LIEDTKE SAN FRANCISCO -- With its low prices and iconoclastic attitude, discount stock broker Charles Schwab Corp. (SCH) represented an annoying stone in Wall Street's wing-tipped shoes for decades " +News: Sluggish movement on power grid cyber security,Industry cyber security standards fail to reach some of the most vulnerable components of the power grid. +Giddy Phelps Touches Gold for First Time,Michael Phelps won the gold medal in the 400 individual medley and set a world record in a time of 4 minutes 8.26 seconds. +Tougher rules won't soften Law's game,"FOXBOROUGH -- Looking at his ridiculously developed upper body, with huge biceps and hardly an ounce of fat, it's easy to see why Ty Law, arguably the best cornerback in football, chooses physical play over finesse. That's not to imply that he's lacking a finesse component, because he can shut down his side of the field much as Deion Sanders " +Shoppach doesn't appear ready to hit the next level,"With the weeks dwindling until Jason Varitek enters free agency, the Red Sox continue to carefully monitor Kelly Shoppach , their catcher of the future, in his climb toward the majors. The Sox like most of what they have seen at Triple A Pawtucket from Shoppach, though it remains highly uncertain whether he can make the adjustments at the plate " +Mighty Ortiz makes sure Sox can rest easy,"Just imagine what David Ortiz could do on a good night's rest. Ortiz spent the night before last with his baby boy, D'Angelo, who is barely 1 month old. He had planned on attending the Red Sox' Family Day at Fenway Park yesterday morning, but he had to sleep in. After all, Ortiz had a son at home, and he " +They've caught his eye,"In helping themselves, Ricky Bryant, Chas Gessner, Michael Jennings, and David Patten did nothing Friday night to make Bill Belichick's decision on what to do with his receivers any easier." +Indians Mount Charge,"The Cleveland Indians pulled within one game of the AL Central lead by beating the Minnesota Twins, 7-1, Saturday night with home runs by Travis Hafner and Victor Martinez." +Sister of man who died in Vancouver police custody slams chief (Canadian Press),Canadian Press - VANCOUVER (CP) - The sister of a man who died after a violent confrontation with police has demanded the city's chief constable resign for defending the officer involved. +"Man Sought 36;50M From McGreevey, Aides Say "," The man who claims Gov. James E. McGreevey sexually harassed him was pushing for a cash settlement of up to 36;50 million before the governor decided to announce that he was gay and had an extramarital affair, sources told The Associated Press." +Explosions Echo Throughout Najaf,"NAJAF, Iraq - Explosions and gunfire rattled through the city of Najaf as U.S. troops in armored vehicles and tanks rolled back into the streets here Sunday, a day after the collapse of talks - and with them a temporary cease-fire - intended to end the fighting in this holy city " +Frail Pope Celebrates Mass at Lourdes,"LOURDES, France - A frail Pope John Paul II, breathing heavily and gasping at times, celebrated an open-air Mass on Sunday for several hundred thousand pilgrims, many in wheelchairs, at a shrine to the Virgin Mary that is associated with miraculous cures. At one point he said help me in Polish while struggling through his homily in French " +Venezuela Prepares for Chavez Recall Vote,Supporters and rivals warn of possible fraud; government says Chavez's defeat could produce turmoil in world oil market. +1994 Law Designed to Preserve Guard Jobs , A 1994 law strengthened job protections for National Guard and Reserve troops called to active duty. Here are major provisions of the Uniformed Services Employment and Reemployment Rights Act (USERRA). +Iran Warns Its Missiles Can Hit Anywhere in Israel," TEHRAN (Reuters) - A senior Iranian military official said Sunday Israel and the United States would not dare attack Iran since it could strike back anywhere in Israel with its latest missiles, news agencies reported." +Afghan Army Dispatched to Calm Violence,"KABUL, Afghanistan - Government troops intervened in Afghanistan's latest outbreak of deadly fighting between warlords, flying from the capital to the far west on U.S. and NATO airplanes to retake an air base contested in the violence, officials said Sunday " +Johnson Helps D-Backs End Nine-Game Slide ," Randy Johnson took a four-hitter into the ninth inning to help the Arizona Diamondbacks end a nine-game losing streak Sunday, beating Steve Trachsel and the New York Mets 2-0." +Retailers Vie for Back-To-School Buyers (Reuters),"Reuters - Apparel retailers are hoping their back-to-school fashions will make the grade among style-conscious teens and young adults this fall, but it could be a tough sell, with students and parents keeping a tighter hold on their wallets." +Politics an Afterthought Amid Hurricane ," If Hurricane Charley had struck three years ago, President Bush's tour through the wreckage of this coastal city would have been just the sort of post-disaster visit that other presidents have made to the scenes of storms, earthquakes, floods and fires." +Spam suspension hits Sohu.com shares (FT.com),"FT.com - Shares in Sohu.com, a leading US-listed Chinese internet portal, fell more than 10 per cent on Friday after China's biggest mobile phone network operator imposed a one-year suspension on its multimedia messaging services because of customers being sent spam." +Erstad's Double Lifts Angels to Win ," Darin Erstad doubled in the go-ahead run in the eighth inning, lifting the Anaheim Angels to a 3-2 victory over the Detroit Tigers on Sunday. The win pulled Anaheim within a percentage point of Boston and Texas in the AL wild-card race." +Drew Out of Braves' Lineup After Injury , Outfielder J.D. Drew missed the Atlanta Braves' game against the St. Louis Cardinals on Sunday night with a sore right quadriceps. +"Venezuelans Flood Polls, Voting Extended"," CARACAS, Venezuela (Reuters) - Venezuelans voted in huge numbers on Sunday in a historic referendum on whether to recall left-wing President Hugo Chavez and electoral authorities prolonged voting well into the night." +Dell Exits Low-End China Consumer PC Market," HONG KONG (Reuters) - Dell Inc. DELL , the world's largest PC maker, said on Monday it has left the low-end consumer PC market in China and cut its overall growth target for the country this year due to stiff competition in the segment." +China Says Taiwan Spy Also Operated in U.S. - Media," BEIJING (Reuters) - Beijing on Monday accused a Chinese-American arrested for spying for Taiwan of building an espionage network in the United States, and said he could go on trial very soon." +Another Major Non-Factor,"Another major, another disappointment for Tiger Woods, the No. 1 ranked player in the world who has not won a major championship since his triumph at the 2002 U.S. Open." +US fighter squadron to be deployed in South Korea next month ," A squadron of US Air Force F-15E fighters based in Alaska will fly to South Korea next month for temporary deployment aimed at enhancing US firepower on the Korean peninsula, US authorities said." +Johnson Back to His Best as D-Backs End Streak, NEW YORK (Reuters) - Randy Johnson struck out 14 batters in 8 1/3 innings to help the Arizona Diamondbacks end a nine-game losing streak with a 2-0 win over the host New York Mets in the National League Sunday. +Restive Maldives eases curfew after rounding up dissidents ," A curfew in the capital of the Maldives was eased but parliament sessions were put off indefinitely and emergency rule continued following last week's riots, officials and residents said." +Vodafone hires Citi for Cesky bid (TheDeal.com),TheDeal.com - The U.K. mobile giant wants to find a way to disentagle the Czech wireless and fixed-line businesses. +Dollar Briefly Hits 4-Wk Low Vs Euro," LONDON (Reuters) - The dollar dipped to a four-week low against the euro on Monday before rising slightly on profit-taking, but steep oil prices and weak U.S. data continued to fan worries about the health of the world's largest economy." +Promoting a Shared Vision,"As Michael Kaleko kept running into people who were getting older and having more vision problems, he realized he could do something about it." +India's Tata expands regional footprint via NatSteel buyout , India's Tata Iron and Steel Company Ltd. took a strategic step to expand its Asian footprint with the announcement it will buy the Asia-Pacific steel operations of Singapore's NatSteel Ltd. +Delegates Urge Cleric to Pull Out of Najaf,"BAGHDAD, Iraq - Delegates at Iraq's National Conference called on radical Shiite cleric Muqtada al-Sadr to abandon his uprising against U.S. and Iraqi troops and pull his fighters out of a holy shrine in Najaf " +Treasuries Slip as Stocks Rally," NEW YORK (Reuters) - U.S. Treasury debt prices slipped on Monday, though traders characterized the move as profit-taking rather than any fundamental change in sentiment." +Dollar Rises Vs Euro on Asset Flows Data, NEW YORK (Reuters) - The dollar extended gains against the euro on Monday after a report on flows into U.S. assets showed enough of a rise in foreign investments to offset the current account gap for the month. +"Sutton Adds Haas, Cink to Ryder Cup Team", MILWAUKEE (Sports Network) - U.S. Ryder Cup captain Hal Sutton finalized his team on Monday when he announced the selections of Jay Haas and Stewart Cink as his captain's picks. +Haas and Cink Selected for Ryder Cup Team,Jay Haas joined Stewart Cink as the two captain's picks for a U.S. team that will try to regain the cup from Europe next month. +Natalie Coughlin Wins 100M Backstroke ," American Natalie Coughlin won Olympic gold in the 100-meter backstroke Monday night. Coughlin, the only woman ever to swim under 1 minute in the event, finished first in 1 minute, 0.37 seconds. Kirsty Coventry of Zimbabwe, who swims at Auburn University in Alabama, earned the silver in 1:00.50. Laure Manaudou of France took bronze in 1:00.88." +Oracle Overhauls Sales-Side Apps for CRM Suite (NewsFactor),"NewsFactor - Oracle (Nasdaq: ORCL) has revamped its sales-side CRM applications in version 11i.10 of its sales, marketing, partner relationship management and e-commerce application." +UN launches 210-million-dollar appeal for flood-hit Bangladesh ," The United Nations launched an appeal here for 210 million dollars to help flood victims facing grave food shortages after two-thirds of Bangladesh was submerged, destroying crops and killing more than 700 people." +Indian state rolls out wireless broadband,Government in South Indian state of Kerala sets up wireless kiosks as part of initiative to bridge digital divide. +"Hurricane Survivors Wait for Water, Gas","PUNTA GORDA, Fla. - Urban rescue teams, insurance adjusters and National Guard troops scattered across Florida Monday to help victims of Hurricane Charley and deliver water and other supplies to thousands of people left homeless " +Jackson Squares Off With Prosecutor,"SANTA MARIA, Calif. - Fans of Michael Jackson erupted in cheers Monday as the pop star emerged from a double-decker tour bus and went into court for a showdown with the prosecutor who has pursued him for years on child molestation charges " +Bobcats Trade Drobnjak to Hawks for Pick , The Charlotte Bobcats traded center Predrag Drobnjak to the Atlanta Hawks on Monday for a second round pick in the 2005 NBA draft. +"Suspect charged in abduction, sexual assault of 11-year-old girl (Canadian Press)","Canadian Press - LANGLEY, B.C. (CP) - Police have arrested a man in the kidnapping and sexual assault of an 11-year-old girl that frightened this suburban Vancouver community last week." +China's Red Flag Linux to focus on enterprise,"Red Flag Software Co., the company behind China's leading Linux client distribution, plans to focus more on its server operating system and enterprise customers, the company's acting president said." +AOL Properties Sign Girafa For Thumbnail Search Images,"AOL Properties Sign Girafa For Thumbnail Search Images Girafa.com Inc. announced today that the CompuServe, Netscape, AIM and ICQ properties of America Online, Inc., have signed an agreement with Girafa to use Girafa's thumbnail search images as an integrated part of their search results. Using Girafa's thumbnail search service, search users can " +Cassini Spies Two Little Saturn Moons ," NASA's Cassini spacecraft has spied two new little moons around satellite-rich Saturn, the space agency said Monday." +On front line of AIDS in Russia,An industrial city northwest of Moscow struggles as AIDS hits a broader population. +Nobel Laureate Decries Stem Cell Limits ," A Nobel laureate in medicine said Monday the Bush administration's limits on funding for embryonic stem cell research effectively have stopped the clock on American scientists' efforts to develop treatments for a host of chronic, debilitating diseases." +Jury Can Hear of Kobe Accuser's Sex Life ," Prosecutors suffered another setback Monday in the Kobe Bryant sexual assault case, losing a last-ditch attempt to keep the NBA star's lawyers from telling jurors about the alleged victim's sex life." +"North Korea Talks Still On, China Tells Downer (Reuters)","Reuters - China has said no date has been set for working-level talks on the North Korean nuclear crisis and gave no indication that the meeting has been canceled, Australian Foreign Minister Alexander Downer said on Tuesday." +Griffin to Anchor D-Line,"The Redskins expect huge things from 300-pound Cornelius Griffin, who was signed to aid the team's weakest unit - the defensive line." +Last American defector in North Korea agrees to tell story ," The last surviving American defector to communist North Korea wants to tell his story to put a human face on the Stalinist state which he believes is unfairly vilified abroad, British film-makers said." +Live: Olympics day four,Richard Faulds and Stephen Parry are going for gold for Great Britain on day four in Athens. +"Kerry Widens Lead in California, Poll Finds (Reuters)","Reuters - Democratic challenger John Kerry has a commanding lead over President Bush in California of 54 percent to 38 percent among likely voters, a poll released on Tuesday found." +Capacity Crowds at Beach Volleyball Rock the Joint," ATHENS (Reuters) - At the beach volleyball, the 2004 Olympics is a sell-out, foot-stomping success." +"Dollar Near Recent Lows, Awaits ZEW/CPI", LONDON (Reuters) - The dollar held steady near this week's four-week low against the euro on Tuesday with investors awaiting a German investor confidence survey and U.S. consumer inflation numbers to shed light on the direction. +Intel to delay product aimed for high-definition TVs,"SAN FRANCISCO -- In the latest of a series of product delays, Intel Corp. has postponed the launch of a video display chip it had previously planned to introduce by year end, putting off a showdown with Texas Instruments Inc. in the fast-growing market for high-definition television displays." +Venezuela vote keeps Chavez as president,CARACAS -- Venezuelans voted resoundingly to keep firebrand populist Hugo Chavez as their president in a victory that drew noisy reactions yesterday from both sides in the streets. International observers certified the results as clean and accurate. +Jailing of HK democrat in China 'politically motivated' , Hong Kong democrats accused China of jailing one of their members on trumped-up prostitution charges in a bid to disgrace a political movement Beijing has been feuding with for seven years. +Kmart Swings to Profit in 2Q; Stock Surges , Shares of Kmart Holding Corp. surged 17 percent Monday after the discount retailer reported a profit for the second quarter and said chairman and majority owner Edward Lampert is now free to invest the company's 36;2.6 billion in surplus cash. +Fischer's Fiancee: Marriage Plans Genuine ," Former chess champion Bobby Fischer's announcement thathe is engaged to a Japanese woman could win him sympathy among Japanese officials and help him avoid deportation to the United States, his fiancee and one of his supporters said Tuesday." +U.S. Misses Cut in Olympic 100 Free,"ATHENS, Greece - Top American sprinters Jason Lezak and Ian Crocker missed the cut in the Olympic 100-meter freestyle preliminaries Tuesday, a stunning blow for a country that had always done well in the event. Pieter van den Hoogenband of the Netherlands and Australian Ian Thorpe advanced to the evening semifinal a day after dueling teenager Michael Phelps in the 200 freestyle, won by Thorpe " +Consumers Would Pay In Phone Proposal,"A proposal backed by a coalition of telephone carriers would cut billions of dollars in fees owed by long-distance companies to regional phone giants but would allow the regional companies to make up some of the difference by raising monthly phone bills for millions of consumers. FONT face verdana,MS Sans Serif,arial,helvetica size -2 color 666666 > B>-The Washington Post /B> /FONT>" +U.S. Brokers Cease-fire in Western Afghanistan," KABUL (Reuters) - The United States has brokered a cease-fire between a renegade Afghan militia leader and the embattled governor of the western province of Herat, Washington's envoy to Kabul said Tuesday." +Sneaky Credit Card Tactics,Keep an eye on your credit card issuers -- they may be about to raise your rates. +Intel Delays Launch of Projection TV Chip,"In another product postponement, semiconductor giant Intel Corp. said it won't be offering a chip for projection TVs by the end of 2004 as it had announced earlier this year." +Fund pessimism grows,"NEW YORK (CNN/Money) - Money managers are growing more pessimistic about the economy, corporate profits and US stock market returns, according to a monthly survey by Merrill Lynch released Tuesday. " +Kederis proclaims innocence,Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing: After the crucifixion comes the resurrection. +Eriksson doesn 39;t feel any extra pressure following scandal,"NEWCASTLE, England - England coach Sven-Goran Eriksson said Tuesday he isn 39;t under any extra pressure in the aftermath of a scandal that damaged the Football Association reputation. " +Injured Heskey to miss England friendly,"NEWCASTLE, England - Striker Emile Heskey has pulled out of the England squad ahead of Wednesday friendly against Ukraine because of a tight hamstring, the Football Association said Tuesday. " +"Staples Profit Up, to Enter China Market"," NEW YORK (Reuters) - Staples Inc. A HREF http://www.investor.reuters.com/FullQuote.aspx?ticker SPLS.O target /stocks/quickinfo/fullquote >SPLS.O /A>, the top U.S. office products retailer, on Tuesday reported a 39 percent jump in quarterly profit, raised its full-year forecast and said it plans to enter the fast-growing Chinese market, sending its shares higher." +Delegation Is Delayed Before Reaching Najaf,"AGHDAD, Iraq, Aug. 17 A delegation of Iraqis was delayed for security reasons today but still intended to visit Najaf to try to convince a rebellious Shiite cleric and his militia to evacuate a shrine in the holy city and end " +"Consumer Prices Down, Industry Output Up"," WASHINGTON (Reuters) - U.S. consumer prices dropped in July for the first time in eight months as a sharp run up in energy costs reversed, the government said in a report that suggested a slow rate of interest rate hikes is likely." +"Olympic history for India, UAE","An Indian army major shot his way to his country first ever individual Olympic silver medal on Tuesday, while in the same event an member of Dubai ruling family became the first ever medallist from the United Arab Emirates. " +Home Depot Likes High Oil,"Rising fuel prices, a bugbear for most of the retail sector, are helping Home Depot (HD:NYSE - news - research), the remodeling giant that reported a surge in second-quarter earnings Tuesday and guided the rest of the year higher. " +China cracks down on phone sex services,"BEIJING, Aug. 17 (Xinhuanet) -- China is carrying out a nationwide campaign to crack down on phone sex services, paralleling another sweeping operation against Internet pornography, Minister of Information Industry Wang Xudong said here Tuesday. " +Surviving Biotech's Downturns,Charly Travers offers advice on withstanding the volatility of the biotech sector. +Mr Downer shoots his mouth off,"Just what Alexander Downer was thinking when he declared on radio last Friday that they could fire a missile from North Korea to Sydney is unclear. The provocative remark, just days before his arrival yesterday on his second visit to the North Korean " +Edwards Banned from Games - Source," ATHENS (Reuters) - World 100 meters champion Torri Edwards will miss the Athens Olympics after her appeal against a two-year drugs ban was dismissed on Tuesday, a source told Reuters." +Stocks Climb on Drop in Consumer Prices,"NEW YORK - Stocks rose for a second straight session Tuesday as a drop in consumer prices allowed investors to put aside worries about inflation, at least for the short term. With gasoline prices falling to eight-month lows, the Consumer Price Index registered a small drop in July, giving consumers a respite from soaring energy prices " +"Iliadis, Tanimoto win judo golds","Ilias Iliadis of Greece thrilled the home crowd Tuesday, beating Roman Gontyuk of Ukraine to win the gold medal in the 81-kilogram class. " +Sudan vows to restore order to Darfur but calls for African peacekeepers ," Sudan will take the lead in restoring order to its rebellious Darfur region but needs the support of African peacekeepers and humanitarian aid, Foreign Minister Mustafa Osman Ismail said." +TGn Sync Proposes New WLAN Standard,The battle over home entertainment networking is heating up as a coalition proposes yet another standard for the IEEE consideration. +Yahoo! Ups Ante for Small Businesses,Web giant Yahoo! is gambling that price cuts on its domain name registration and Web hosting products will make it more competitive with discounters in the space -- which means that small businesses looking to move online get a sweeter deal through +IBM Buys Two Danish Services Firms,"IBM said Tuesday it has acquired a pair of Danish IT services firms as part of its effort to broaden its presence in Scandinavia. As a result of the moves, IBM will add about 3,700 IT staffers to its global head count. Financial terms of " +Motorola and HP in Linux tie-up,"Motorola plans to sell mobile phone network equipment that uses Linux-based code, a step forward in network gear makers 39; efforts to rally around a standard. " +Microsoft Pushes Off SP2 Release,"Microsoft will delay the release of its SP2 update for another week to fix software glitches. But not everyone is quite so eager to install the SP2 update for Windows XP. In fact, many companies have demanded the ability to prevent their " +Cassini Space Probe Spots Two New Saturn Moons (Reuters),"Reuters - Two new moons were spotted around Saturn by the Cassini space probe, raising the total to 33 moons for the ringed planet, NASA said on Monday." +Buckeyes have lots to replace but are brimming with optimism,There are remarkable similarities between the 2004 Ohio State Buckeyes and those that won the national championship just two years ago. +IBM adds midrange server to eServer lineup,The new IBM Power5 eServer i5 550 also features higher performance and new virtualization capabilities that allow it to run multiple operating systems at once on separate partitions. +iPod Comparison,"Newsday 146;s Stephen Williams reports on seeing Sony 146;s NW-HD1 audio player in a store: 147; 145;How 146;s it compare to the iPod? 146; I asked a salesman. 145;Battery life is a lot longer, up to 30 hours, 146; he said. 145;The LCD readout is kind of dim, 146; I said. 146;Battery life is a lot longer, 146; he said. 145;I understand it can 146;t play MP3 files, 146; I said. 145;Battery life is a lot longer, 146; he said. 148; Aug 17" +Mills Grabs 1B Portfolio; Taubman Likely to Lose Contracts,"Mills Corp. agreed to purchase a 50 percent interest in nine malls owned by General Motors Asset Management Corp. for just over 1 billion, creating a new joint venture between the groups. The deal will extend " +Women stumble to silver,ATHENS -- The mistakes were so minor. Carly Patterson foot scraping the lower of the uneven bars. Courtney Kupets 39; tumbling pass that ended here instead of there. Mohini Bhardwaj slight stumble on the beam. +Oil prices bubble to record high,"The price of oil has continued its sharp rise overnight, closing at a record high. The main contract in New York, light sweet crude for delivery next month, has closed at a record US46.75 a barrel - up 70 cents on yesterday close. " +Notable quotes Tuesday at the Athens Olympics," It hurt like hell. I could see (Thorpe) coming up. But when I was breathing, I saw my team going crazy -- and that really kept me going. " +AMD Ships Notebook Chips,"It wasn 39;t the first to go small, and it won 39;t be the biggest producer, but AMD (Quote, Chart) 64-bit 90-nanometer (nm) chips are expected to make waves in the semiconductor pool. " +UK charges 8 in terror plot linked to alert in US,"LONDON, AUGUST 17: Britain charged eight terror suspects on Tuesday with conspiracy to commit murder and said one had plans that could be used in striking US buildings that were the focus of security scares this month. " +IBM Seeks To Have SCO Claims Dismissed (NewsFactor),"NewsFactor - IBM (NYSE: IBM) has -- again -- sought to have the pending legal claims by The SCO Group dismissed. According to a motion it filed in a U.S. district court, IBM argues that SCO has no evidence to support its claims that it appropriated confidential source code from Unix System V and placed it in Linux." +SUVs: Live And Let Die,NEW YORK - The newly released traffic crash fatality data have something for everyone in the debate about the safety of sport utility vehicles. +Security scare as intruder dives in,A CANADIAN husband love for his wife has led to a tightening of security at all Olympic venues in Athens. +"Team USA barely wins, but struggles not all players 39; fault","Now that everybody in and around USA Basketball has breathed a huge sigh of relief, let not get carried away. " +UPI NewsTrack Sports,"-- The United States men basketball team capped off a big day for the USA by fighting off Greece for a vital win, 77-71. They played with heart, said Coach Larry Brown. That all you can ask. " +Peace delegation leaves Najaf empty-handed as fighting continues,"BAGHDAD, Iraq - A national political conference bid to end the fighting in the Shiite Muslim holy city of Najaf appeared to have failed Tuesday. " +Georgian president calls for international conference on South Ossetia,"TBILISI, Georgia Georgian President Mikhail Saakashvili appealed to world leaders Tuesday to convene an international conference on the conflict in breakaway South Ossetia, where daily exchanges of gunfire threaten to spark " +"Shelling, shooting resumes in breakaway Georgian region "," Georgian and South Ossetian forces overnight accused each other of trying to storm the other side's positions in Georgia's breakaway region of South Ossetia, as four Georgian soldiers were reported to be wounded." +"Youkilis, McCarty placed on 15-day disabled list","BOSTON -- It was another busy day on the medical front for the Red Sox, as a series of roster moves were announced prior to Tuesday night game against the Blue Jays. " +Kerry-Kerrey Confusion Trips Up Campaign ," John Kerry, Bob Kerrey. It's easy to get confused." +Former Florida Swimming Coach Dies at 83 ," William H. Harlan, the retired University of Florida swimming coach who led the Gators to eight conference titles, died Tuesday, school officials said. He was 83." +US Men Have Right Touch in Relay Duel Against Australia,"THENS, Aug. 17 - So Michael Phelps is not going to match the seven gold medals won by Mark Spitz. And it is too early to tell if he will match Aleksandr Dityatin, the Soviet gymnast who won eight total medals in 1980. But those were not the " +Schrder adopts Russian orphan,"Three-year-old Victoria, from St Petersburg, has been living at the Schrders 39; family home in Hanover in northern Germany for several weeks. " +Cabrera Leads Red Sox Past Blue Jays 5-4 ," Orlando Cabrera hit a run-scoring double off the Green Monster in the ninth inning on reliever Justin Speier's second pitch of the game, giving the Boston Red Sox a 5-4 win over the Toronto Blue Jays on Tuesday night." +United Arab Emirates trap shooter secures nation first Olympic gold,Sheik Ahmed bin Hashr Al-Maktoum earned the first-ever Olympic medal for the United Arab Emirates when he took home the gold medal in men double trap shooting on Tuesday in Athens. +"Sharon orders 1,000 homes in West Bank","Israel announced plans for 1,000 houses in the West Bank yesterday, accelerating the expansion of the settlements. " +So. Cal Player Investigated in Sex Assault ," At least one member of the top-ranked Southern California football team is under investigation for sexual assault, the Los Angeles Police Department said Tuesday." +Bush Promotes His Plan for Missile Defense System,"President Bush, in Pennsylvania, said that opponents of a missile defense system were putting the nation's security at risk." +China Sighs in Relief as Yao Scores High,BEIJING (Reuters) - China breathed a measured sigh of relief after the skills of its basketball giant Yao Ming dwarfed New Zealand to sweep his team nearer to their goal of reaching the Athens Olympics semi-finals. +Israelis OK new homes in West Bank,"A leaked Israeli plan to build 1,000 new Jewish settler homes in the West Bank yesterday sent Bush administration officials scrambling for a response in the sensitive period before November presidential election. " +Britain accuses 8 of terror plot,"LONDON - British police charged eight terrorist suspects yesterday with conspiring to commit murder and use radioactive materials, toxic gases, chemicals or explosives to cause fear or injury. " +Israel kills 5 in strike at Hamas activist,"Islamic group armed wing, the Izz el-Deen al-Qassam Brigades. Doctors said he suffered leg wounds. " +Zambrano Out Early; So Are Mets,"ENVER, Aug. 17 - Victor Zambrano came to the Mets with radical movement on his pitches, fixable flaws in his delivery and a curious sore spot lingering around his right elbow. " +"Dollar Stuck, CPI Offers Little Direction", TOKYO (Reuters) - The dollar moved in tight ranges on Wednesday as most investors shrugged off lower-than-expected U.S. inflation data and stuck to the view the U.S. Federal Reserve would continue raising rates. +St. Louis Cardinals News,"Right-hander Matt Morris threw seven solid innings, but the Cardinals needed a bases-loaded walk to second baseman Tony Womack and a grand slam from new right fielder Larry Walker to key a six-run eighth inning for a " +Greek sprinters arrive at IOC hearing,"ATHENS (Reuters) - Greek sprinters Costas Kenteris and Katerina Thanou have arrived at an Athens hotel for an International Olympic Committee (IOC) hearing into their missed doped tests, a saga that has shamed and angered the Olympic host " +Flop in the ninth inning sinks Jays,BOSTON -- The Toronto Blue Jays have had worse hitting games this season against lesser pitchers than Pedro Martinez. +Fresh Fighting Shatters Short-Lived Ceasefire Deal,"Renewed clashes in South Ossetia, which resulted in death of two Georgian soldiers, erupted late on August 17, several hours after the South Ossetian and Georgian officials agreed on ceasefire. As a result Tbilisi has already announced that it will not " +Hamm hopes to get on a roll,"Paul Hamm takes another shot at history tonight, when he'll try to become the first American to win the Olympic men's all-around in gymnastics." +Karzai Promises Afghans Security for Election (Reuters),Reuters - Afghanistan's President Hamid Karzai promised Afghans greater security when they go to vote in the country's first ever democratic election during an independence day speech on Wednesday. +Google Lowers Its IPO Price Range,"SAN JOSE, Calif. - In a sign that Google Inc.'s initial public offering isn't as popular as expected, the company lowered its estimated price range to between 85 and 95 per share, down from the earlier prediction of 108 and 135 per share " +"Future Doctors, Crossing Borders",Students at the Mount Sinai School of Medicine learn that diet and culture shape health in East Harlem. +Oil Sets New Record 47 on Iraq Threat, LONDON (Reuters) - Oil prices surged to a new high of 47 a barrel on Wednesday after a new threat by rebel militia against Iraqi oil facilities and as the United States said inflation had stayed in check despite rising energy costs. +Greek sprinters quit to end Games scandal,ATHENS (Reuters) - Greece two top athletes have pulled out of the Athens Olympics and apologised to the Greek people for a scandal over missed dope tests that has tarnished the Games 39; return to their birthplace. +Phelps Eyes Fourth Gold," ATHENS (Reuters) - A weary Michael Phelps targeted his fourth Olympic gold medal in Athens, turning his attention on Wednesday to the 200 meters individual medley and settling for the second-fastest overall time in the heats." +Israel Kills 5 in Attempt to Assassinate Hamas Man, GAZA (Reuters) - A senior Hamas leader survived an Israeli assassination attempt in the Gaza Strip Wednesday but at least five other Palestinians were killed in the explosion that tore through his home. diff --git a/examples/milvus-text-search-engine/client/latency-test-client.py b/examples/milvus-text-search-engine/client/latency-test-client.py new file mode 100644 index 0000000000..e1320e6c7f --- /dev/null +++ b/examples/milvus-text-search-engine/client/latency-test-client.py @@ -0,0 +1,108 @@ +import requests, argparse, json, threading, queue, time, numpy +from pprint import pprint + +DATASET_PATH = "client/example.csv" +URL = "http://localhost:5000/" +ITERS = 100 +NUM_CLIENTS = 1 +SENTENCE = "The United States has brokered a cease-fire between a renegade Afghan militia leader and the embattled governor of the western province of Herat, Washington's envoy to Kabul said Tuesday." +QUERY = "query_sentence=" + SENTENCE + +parser = argparse.ArgumentParser() +parser.add_argument("--dataset_path", type=str, default=DATASET_PATH) +parser.add_argument("--url", type=str, default=URL) +parser.add_argument("--iters_per_client", type=int, default=ITERS) +parser.add_argument("--num_clients", type=int, default=NUM_CLIENTS) + +def setup_test(url: str, dataset_path: str): + print("\nSETUP TESTING:") + + print("\nDropping existing collections, if they exist ...") + resp = requests.post(url + "drop") + print(resp.text) + + print("\nReloading dataset into Milvus ...") + resp = requests.post(url + "load", files={'file': open(dataset_path, 'rb')}) + print(resp.text) + + print("\nConfirming 160 items in Milvus ...") + resp = requests.post(url + "count") + assert 160 == int(resp.text) + print('"Confirmed"') + + print("\nWarming up for 10 iterations + clearing latency tracker...") + for _ in range(10): + resp = requests.get(url + "search", QUERY) + assert len(json.loads(resp.text).keys()) == 9 + requests.post(url + "latency") + + print("Requests working + warmed up.") + +class ExecutorThread(threading.Thread): + def __init__(self, url:str, iters_per_client:int, time_queue:queue): + super(ExecutorThread, self).__init__() + self._url = url + self._iters = iters_per_client + self._time_queue = time_queue + + def iteration(self): + start = time.perf_counter() + resp = requests.get(self._url + "search", QUERY) + assert len(json.loads(resp.text).keys()) == 9 + end = time.perf_counter() + return start, end + + def run(self): + for _ in range(self._iters): + start, end = self.iteration() + self._time_queue.put([start, end]) + + +def run_test(url: str, num_clients:int, iters_per_client:int): + print("\nRUNNING LATENCY TEST:") + time_queue = queue.Queue() # threadsafe + + print("\nRunning Threads...") + threads = [] + for _ in range(num_clients): + threads.append(ExecutorThread(url, iters_per_client, time_queue)) + + for thread in threads: + thread.start() + + for thread in threads: + thread.join() + print("Done Running.") + + print("\nModel Latency Stats:") + resp = requests.post(url + "latency") + pprint(json.loads(resp.text)) + + print("\nQuery Latency Stats:") + batch_times = list(time_queue.queue) + assert len(batch_times) == iters_per_client * num_clients + batch_times_ms = [(batch_time[1] - batch_time[0]) * 1000 for batch_time in batch_times] + pprint({ + 'count': len(batch_times), + 'median': numpy.median(batch_times_ms), + 'mean': numpy.mean(batch_times_ms), + 'std': numpy.std(batch_times_ms) + }) + + print("\nQuery Output:") + resp = json.loads(requests.get(url + "search", QUERY).text) + for idx in resp: + print(resp[idx]['title']) + +if __name__ == "__main__": + args = vars(parser.parse_args()) + + # setup + warmup + setup_test(args['url'], args['dataset_path']) + + # run the actual tests + run_test( + url=args['url'], + num_clients=args['num_clients'], + iters_per_client=args['iters_per_client'] + ) \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-requirements.txt b/examples/milvus-text-search-engine/server/app-requirements.txt new file mode 100644 index 0000000000..467dd01ef9 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-requirements.txt @@ -0,0 +1,8 @@ +numpy +scikit-learn +pymilvus==2.0.1 +pymysql +fastapi +uvicorn +requests +python-multipart \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-server/src/__init__.py b/examples/milvus-text-search-engine/server/app-server/src/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/examples/milvus-text-search-engine/server/app-server/src/app.py b/examples/milvus-text-search-engine/server/app-server/src/app.py new file mode 100644 index 0000000000..48bb7282e3 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/app.py @@ -0,0 +1,120 @@ +import os, argparse +import uvicorn +from fastapi import FastAPI, File, UploadFile +from starlette.middleware.cors import CORSMiddleware +from logs import LOGGER +from milvus_helpers import MilvusHelper +from mysql_helpers import MySQLHelper +from operations.load import do_load +from operations.search import search_milvus +from operations.count import do_count +from operations.drop import do_drop +from encode import SentenceModel + +parser = argparse.ArgumentParser() +parser.add_argument("--database_host", type=str, default="127.0.0.1") +parser.add_argument("--model_host", type=str, default="127.0.0.1") + +def start_server( + database_host, + model_host, + host: str = "0.0.0.0", + port: int = 5000 +): + + MODEL = SentenceModel(model_host) + MILVUS_CLI = MilvusHelper(database_host) + MYSQL_CLI = MySQLHelper(database_host) + + app = FastAPI() + app.add_middleware( + CORSMiddleware, + allow_origins=["*"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"]) + + @app.post('/latency') + def compute_latency(): + # Compute Latency of Recent Queries + Reset Data + try: + stats = MODEL.compute_latency() + LOGGER.info("Successfully computed recent query latency!") + return stats + except Exception as e: + LOGGER.error(e) + return {'status': False, 'msg': e}, 400 + + @app.post('/count') + async def count_text(table_name: str = None): + # Returns the total number of titles in the system + try: + num = do_count(table_name, MILVUS_CLI) + LOGGER.info("Successfully count the number of titles!") + return num + except Exception as e: + LOGGER.error(e) + return {'status': False, 'msg': e}, 400 + + @app.post('/drop') + async def drop_tables(): + # Delete the collection of Milvus and MySQL + try: + status = do_drop(MILVUS_CLI, MYSQL_CLI) + data_map = {} + LOGGER.info("Successfully drop tables in Milvus!") + return status + except Exception as e: + LOGGER.error(e) + return {'status': False, 'msg': e}, 400 + + @app.post('/load') + async def load_text(file: UploadFile = File(...),): + data_path = None + try: + text = await file.read() + fname = file.filename + dirs = "data" + if not os.path.exists(dirs): + os.makedirs(dirs) + data_path = os.path.join(os.getcwd(), os.path.join(dirs, fname)) + with open(data_path, 'wb') as f: + f.write(text) + except Exception : + return {'status': False, 'msg': 'Failed to load data.'} + + # Insert all data in file path to Milvus + try: + count, inference_time, db_load_time = do_load(MODEL, MILVUS_CLI, MYSQL_CLI, data_path) + LOGGER.info(f"Successfully loaded data, total count: {count}") + return f"Successfully loaded data. Inference Time {inference_time}; DB Load Time {db_load_time}" + except Exception as e: + LOGGER.error(e) + return {'status': False, 'msg': e}, 400 + + + @app.get('/search') + async def do_search_api(query_sentence: str = None): + try: + ids, title, text, _ = search_milvus(query_sentence, MODEL, MILVUS_CLI, MYSQL_CLI) + res = {} + for idx, title_i, text_i in zip(ids, title, text): + res[idx] = { + 'title': title_i, + 'text' : text_i + } + LOGGER.info("Successfully searched similar text!") + return res + except Exception as e: + LOGGER.error(e) + return {'status': False, 'msg': e}, 400 + + # run with 1 worker process to avoid copying model + # note: FastAPI handles concurrent request via a ThreadPool + # note: DeepSparse Pipelines handle concurrent inferences via a ThreadPool + # and DeepSparse engine can handle multiple input streams + uvicorn.run(app=app, host=host, port=port, workers=4) + +if __name__ == "__main__": + args = vars(parser.parse_args()) + start_server(args["database_host"], args["model_host"]) \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-server/src/config.py b/examples/milvus-text-search-engine/server/app-server/src/config.py new file mode 100644 index 0000000000..eaf33f7f6e --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/config.py @@ -0,0 +1,21 @@ +############### Number of log files ############### +LOGS_NUM = 0 + +############### Milvus Configuration ############### +MILVUS_PORT = 19530 +VECTOR_DIMENSION = 768 +METRIC_TYPE = "L2" +INDEX_TYPE = "IVF_SQ8" +NLIST = 1024 +DEFAULT_TABLE = "test_table" +TOP_K = 9 +NPROBE = 10 + +############### MySQL Configuration ############### +MYSQL_PORT = 3306 +MYSQL_USER = "root" +MYSQL_PWD = "123456" +MYSQL_DB = "mysql" + +############## Model Configuration ################# +MODEL_PORT = 5543 \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-server/src/encode.py b/examples/milvus-text-search-engine/server/app-server/src/encode.py new file mode 100644 index 0000000000..3390936982 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/encode.py @@ -0,0 +1,54 @@ +from queue import Queue +from typing import List +from config import MODEL_PORT +import numpy as np +import time, requests, json +from sklearn.preprocessing import normalize + +class SentenceModel: + def __init__(self, + model_host="localhost", + timing=True + ): + self._model_url = f"http://{model_host}:{MODEL_PORT}/predict" + self._timing = timing + if self._timing: + self._time_queue = Queue() + + def make_inference_request(self, data:List[str]): + obj = { + 'inputs': data + } + response = requests.post(self._model_url, json=obj) + return json.loads(response.text)["embeddings"] + + def sentence_encode(self, data:List[str], is_load=False): + start = time.perf_counter() + embedding = self.make_inference_request(data) + sentence_embeddings = normalize(np.array(embedding)).tolist() + end = time.perf_counter() + + if self._timing and not is_load: + self._time_queue.put([start, end]) + + return sentence_embeddings + + def compute_latency(self): + batch_times = list(self._time_queue.queue) + if len(batch_times) == 0: + return { + "msg" : "Latency data has been cleared" + } + + batch_times_ms = [ + (batch_time[1] - batch_time[0]) * 1000 for batch_time in batch_times + ] + + self._time_queue.queue.clear() + + return { + "count" : len(batch_times), + "median": np.median(batch_times_ms), + "mean": np.mean(batch_times_ms), + "std": np.std(batch_times_ms) + } \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-server/src/logs.py b/examples/milvus-text-search-engine/server/app-server/src/logs.py new file mode 100644 index 0000000000..d15a1114c8 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/logs.py @@ -0,0 +1,132 @@ +import os +import re +import datetime +import logging +import sys +from config import LOGS_NUM + + +try: + import codecs +except ImportError: + codecs = None + + +class MultiprocessHandler(logging.FileHandler): + """ + class + """ + def __init__(self, filename, when='D', backupCount=0, encoding=None, delay=False): + self.prefix = filename + self.backupCount = backupCount + self.when = when.upper() + self.extMath = r"^\d{4}-\d{2}-\d{2}" + + self.when_dict = { + 'S': "%Y-%m-%d-%H-%M-%S", + 'M': "%Y-%m-%d-%H-%M", + 'H': "%Y-%m-%d-%H", + 'D': "%Y-%m-%d" + } + + self.suffix = self.when_dict.get(when) + if not self.suffix: + print('The specified date interval unit is invalid: ', self.when) + sys.exit(1) + + self.filefmt = os.path.join('.', "logs", f'{self.prefix}-{self.suffix}.log') + + self.filePath = datetime.datetime.now().strftime(self.filefmt) + + _dir = os.path.dirname(self.filefmt) + try: + if not os.path.exists(_dir): + os.makedirs(_dir) + except Exception as e: + print('Failed to create log file: ', e) + print("log_path:" + self.filePath) + sys.exit(1) + + if codecs is None: + encoding = None + + logging.FileHandler.__init__(self, self.filePath, 'a+', encoding, delay) + + def shouldChangeFileToWrite(self): + _filePath = datetime.datetime.now().strftime(self.filefmt) + if _filePath != self.filePath: + self.filePath = _filePath + return True + return False + + def doChangeFile(self): + self.baseFilename = os.path.abspath(self.filePath) + if self.stream: + self.stream.close() + self.stream = None + + if not self.delay: + self.stream = self._open() + if self.backupCount > 0: + for s in self.getFilesToDelete(): + os.remove(s) + + def getFilesToDelete(self): + dir_name, _ = os.path.split(self.baseFilename) + file_names = os.listdir(dir_name) + result = [] + prefix = self.prefix + '-' + for file_name in file_names: + if file_name[:len(prefix)] == prefix: + suffix = file_name[len(prefix):-4] + if re.compile(self.extMath).match(suffix): + result.append(os.path.join(dir_name, file_name)) + result.sort() + + if len(result) < self.backupCount: + result = [] + else: + result = result[:len(result) - self.backupCount] + return result + + def emit(self, record): + try: + if self.shouldChangeFileToWrite(): + self.doChangeFile() + logging.FileHandler.emit(self, record) + except (KeyboardInterrupt, SystemExit): + raise + except: + self.handleError(record) + + +def write_log(): + logger = logging.getLogger() + logger.setLevel(logging.DEBUG) + # formatter = '%(asctime)s | %(levelname)s | %(filename)s | %(funcName)s | %(module)s | %(lineno)s | %(message)s' + fmt = logging.Formatter( + '%(asctime)s | %(levelname)s | %(filename)s | %(funcName)s | %(lineno)s | %(message)s') + + stream_handler = logging.StreamHandler(sys.stdout) + stream_handler.setLevel(logging.INFO) + stream_handler.setFormatter(fmt) + + log_name = "milvus" + file_handler = MultiprocessHandler(log_name, when='D', backupCount=LOGS_NUM) + file_handler.setLevel(logging.DEBUG) + file_handler.setFormatter(fmt) + file_handler.doChangeFile() + + logger.addHandler(stream_handler) + logger.addHandler(file_handler) + + return logger + + +LOGGER = write_log() +# if __name__ == "__main__": +# message = 'test writing logs' +# logger = write_log() +# logger.info(message) +# logger.debug(message) +# logger.error(message) diff --git a/examples/milvus-text-search-engine/server/app-server/src/milvus_helpers.py b/examples/milvus-text-search-engine/server/app-server/src/milvus_helpers.py new file mode 100644 index 0000000000..c091b4d78c --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/milvus_helpers.py @@ -0,0 +1,117 @@ +import sys +from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility + +from config import MILVUS_PORT, VECTOR_DIMENSION, METRIC_TYPE, INDEX_TYPE, NLIST, NPROBE +from logs import LOGGER + +class MilvusHelper: + """ + class + """ + def __init__(self, milvus_host): + try: + self.collection = None + connections.connect(host=milvus_host, port=MILVUS_PORT) + LOGGER.debug(f"Successfully connect to Milvus with IP:{milvus_host,} and PORT:{MILVUS_PORT}") + except Exception as e: + LOGGER.error(f"Failed to connect Milvus: {e}") + sys.exit(1) + + def set_collection(self, collection_name): + try: + if self.has_collection(collection_name): + self.collection = Collection(name=collection_name) + else: + raise Exception(f"There has no collection named:{collection_name}") + except Exception as e: + LOGGER.error(f"Error: {e}") + sys.exit(1) + + def has_collection(self, collection_name): + # Return if Milvus has the collection + try: + status = utility.has_collection(collection_name) + return status + except Exception as e: + LOGGER.error(f"Failed to check collection: {e}") + sys.exit(1) + + def create_collection(self, collection_name): + # Create milvus collection if not exists + try: + if not self.has_collection(collection_name): + field1 = FieldSchema(name="id", dtype=DataType.INT64, descrition="int64", is_primary=True, auto_id=True) + field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, descrition="float vector", dim=VECTOR_DIMENSION, is_primary=False) + schema = CollectionSchema(fields=[field1,field2], description="collection description") + self.collection = Collection(name=collection_name, schema=schema) + LOGGER.debug(f"Create Milvus collection: {self.collection}") + return "Successfully created collection" + except Exception as e: + LOGGER.error(f"Failed to create collection: {e}") + sys.exit(1) + + def insert(self, collection_name, vectors): + # Batch insert vectors to milvus collection + try: + self.create_collection(collection_name) + self.collection = Collection(name=collection_name) + data = [vectors] + mr = self.collection.insert(data) + ids = mr.primary_keys + self.collection.load() + LOGGER.debug(f"Insert vectors to Milvus in collection: {collection_name} with {len(vectors)} rows") + return ids + except Exception as e: + LOGGER.error(f"Failed to insert data into Milvus: {e}") + sys.exit(1) + + def create_index(self, collection_name): + # Create IVF_SQ8 index on milvus collection + try: + self.set_collection(collection_name) + default_index = {"index_type": INDEX_TYPE, "metric_type": METRIC_TYPE, "params": {"nlist": NLIST}} + status = self.collection.create_index(field_name="embedding", index_params=default_index) + if not status.code: + LOGGER.debug( + f"Successfully create index in collection:{collection_name} with param:{default_index}") + return status + else: + raise Exception(status.message) + except Exception as e: + LOGGER.error(f"Failed to create index: {e}") + sys.exit(1) + + def delete_collection(self, collection_name): + # Delete Milvus collection + try: + self.set_collection(collection_name) + self.collection.drop() + LOGGER.debug("Successfully drop collection!") + return "Successfully dropped collection" + except Exception as e: + LOGGER.error(f"Failed to drop collection: {e}") + sys.exit(1) + + def search_vectors(self, collection_name, vectors, top_k): + # Search vector in milvus collection + try: + self.set_collection(collection_name) + search_params = {"metric_type": METRIC_TYPE, "params": {"nprobe": NPROBE}} + res=self.collection.search(vectors, anns_field="embedding", param=search_params, limit=top_k) + LOGGER.debug(f"Successfully search in collection: {res}") + return res + except Exception as e: + LOGGER.error(f"Failed to search in Milvus: {e}") + sys.exit(1) + + def count(self, collection_name): + # Get the number of milvus collection + try: + self.set_collection(collection_name) + num = self.collection.num_entities + LOGGER.debug(f"Successfully get the num:{num} of the collection:{collection_name}") + return num + except Exception as e: + LOGGER.error(f"Failed to count vectors in Milvus: {e}") + sys.exit(1) + diff --git a/examples/milvus-text-search-engine/server/app-server/src/mysql_helpers.py b/examples/milvus-text-search-engine/server/app-server/src/mysql_helpers.py new file mode 100644 index 0000000000..871e0b096f --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/mysql_helpers.py @@ -0,0 +1,99 @@ +import pymysql +import sys +from config import MYSQL_PORT, MYSQL_USER, MYSQL_PWD, MYSQL_DB +from logs import LOGGER + +class MySQLHelper(): + """ + class + """ + def __init__(self, mysql_host): + self._mysql_host = mysql_host + self.conn = pymysql.connect(host=self._mysql_host, user=MYSQL_USER, port=MYSQL_PORT, password=MYSQL_PWD, + database=MYSQL_DB, + local_infile=True) + self.cursor = self.conn.cursor() + + def test_connection(self): + try: + self.conn.ping() + except Exception: + self.conn = pymysql.connect(host=self._mysql_host, user=MYSQL_USER, port=MYSQL_PORT, password=MYSQL_PWD, + database=MYSQL_DB,local_infile=True) + self.cursor = self.conn.cursor() + + def create_mysql_table(self, table_name): + # Create mysql table if not exists + self.test_connection() + sql = "create table if not exists " + table_name + "(milvus_id TEXT, title TEXT ,text TEXT);" + try: + self.cursor.execute(sql) + LOGGER.debug(f"MYSQL create table: {table_name} with sql: {sql}") + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e} with sql: {sql}") + sys.exit(1) + + def load_data_to_mysql(self, table_name, data): + # Batch insert (Milvus_ids, img_path) to mysql + self.test_connection() + sql = "insert into " + table_name + " (milvus_id,title,text) values (%s,%s,%s);" + try: + self.cursor.executemany(sql, data) + self.conn.commit() + LOGGER.debug(f"MYSQL loads data to table: {table_name} successfully") + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e} with sql: {sql}") + sys.exit(1) + + def search_by_milvus_ids(self, ids, table_name): + # Get the img_path according to the milvus ids + self.test_connection() + str_ids = str(ids).replace('[', '').replace(']', '') + sql = "select * from " + table_name + " where milvus_id in (" + str_ids + ") order by field (milvus_id," + str_ids + ");" + try: + self.cursor.execute(sql) + results = self.cursor.fetchall() + results_id = [res[0] for res in results] + results_title = [res[1] for res in results] + results_text = [res[2] for res in results] + LOGGER.debug("MYSQL search by milvus id.") + return results_id,results_title, results_text + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e } with sql: {sql}") + sys.exit(1) + + def delete_table(self, table_name): + # Delete mysql table if exists + self.test_connection() + sql = "drop table if exists " + table_name + ";" + try: + self.cursor.execute(sql) + LOGGER.debug(f"MYSQL delete table:{table_name}") + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e} with sql: {sql}") + sys.exit(1) + + def delete_all_data(self, table_name): + # Delete all the data in mysql table + self.test_connection() + sql = 'delete from ' + table_name + ';' + try: + self.cursor.execute(sql) + self.conn.commit() + LOGGER.debug(f"MYSQL delete all data in table:{table_name}") + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e} with sql: {sql}") + sys.exit(1) + + def count_table(self, table_name): + # Get the number of mysql table + self.test_connection() + sql = "select count(milvus_id) from " + table_name + ";" + try: + self.cursor.execute(sql) + results = self.cursor.fetchall() + LOGGER.debug(f"MYSQL count table:{table_name}") + return results[0][0] + except Exception as e: + LOGGER.error(f"MYSQL ERROR: {e} with sql: {sql}") + sys.exit(1) \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/app-server/src/operations/__init__.py b/examples/milvus-text-search-engine/server/app-server/src/operations/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/examples/milvus-text-search-engine/server/app-server/src/operations/count.py b/examples/milvus-text-search-engine/server/app-server/src/operations/count.py new file mode 100644 index 0000000000..91f0ae115f --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/operations/count.py @@ -0,0 +1,18 @@ +import sys + +sys.path.append("..") +from config import DEFAULT_TABLE +from logs import LOGGER + + +def do_count(table_name, milvus_cli): + if not table_name: + table_name = DEFAULT_TABLE + try: + if not milvus_cli.has_collection(table_name): + return None + num = milvus_cli.count(table_name) + return num + except Exception as e: + LOGGER.error( f"Error with count table {e}") + sys.exit(1) diff --git a/examples/milvus-text-search-engine/server/app-server/src/operations/drop.py b/examples/milvus-text-search-engine/server/app-server/src/operations/drop.py new file mode 100644 index 0000000000..df79f92de9 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/operations/drop.py @@ -0,0 +1,15 @@ +import sys +sys.path.append("..") +from config import DEFAULT_TABLE +from logs import LOGGER + +def do_drop(milvus_cli, mysql_cli, table_name=DEFAULT_TABLE): + try: + if not milvus_cli.has_collection(table_name): + return "Collection does not exist" + status = milvus_cli.delete_collection(table_name) + mysql_cli.delete_table(table_name) + return status + except Exception as e: + LOGGER.error(f"Error with drop table: {e}") + sys.exit(1) diff --git a/examples/milvus-text-search-engine/server/app-server/src/operations/load.py b/examples/milvus-text-search-engine/server/app-server/src/operations/load.py new file mode 100644 index 0000000000..7376f05bd1 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/operations/load.py @@ -0,0 +1,43 @@ +import sys, time +import numpy as np +import pandas as pd + +sys.path.append("..") +from config import DEFAULT_TABLE +from logs import LOGGER + +# Get the vector of search +def extract_features(path, model): + try: + data = pd.read_csv(path) + title_data = data['title'].tolist() + text_data = data['text'].tolist() + sentence_embeddings = model.sentence_encode(text_data, is_load=True) + return title_data, text_data, sentence_embeddings + except Exception as e: + LOGGER.error(f" Error with extracting feature from question {e}") + sys.exit(1) + +# format data for submission to mmysql +def format_data_mysql(ids, title_data, text_data): + # combine the id of the vector and question data into list of tuples + data = [] + for i in range(len(ids)): + value = (str(ids[i]), title_data[i], text_data[i]) + data.append(value) + return data + +# Import vectors to milvus + create local lookup table +def do_load(embedding_model, milvus_client, mysql_client, data_path, collection_name=DEFAULT_TABLE): + start = time.perf_counter() + title_data, text_data, sentence_embeddings = extract_features(data_path, embedding_model) + end = time.perf_counter() + + start_db = time.perf_counter() + ids = milvus_client.insert(collection_name, sentence_embeddings) + milvus_client.create_index(collection_name) + mysql_client.create_mysql_table(collection_name) + mysql_client.load_data_to_mysql(collection_name, format_data_mysql(ids, title_data, text_data)) + end_db = time.perf_counter() + + return len(ids), end - start, end_db - start_db diff --git a/examples/milvus-text-search-engine/server/app-server/src/operations/search.py b/examples/milvus-text-search-engine/server/app-server/src/operations/search.py new file mode 100644 index 0000000000..0bc2bc9646 --- /dev/null +++ b/examples/milvus-text-search-engine/server/app-server/src/operations/search.py @@ -0,0 +1,19 @@ +import sys +import numpy as np + +sys.path.append("..") +from config import TOP_K, DEFAULT_TABLE +from logs import LOGGER + +def search_milvus(query_sentence, model, milvus_cli, mysql_cli, table_name=DEFAULT_TABLE): + try: + vectors = model.sentence_encode([query_sentence]) + results = milvus_cli.search_vectors(table_name, vectors, TOP_K) + vids = [str(x.id) for x in results[0]] + ids, title, text = mysql_cli.search_by_milvus_ids(vids, table_name) + distances = [x.distance for x in results[0]] + return ids, title, text, distances + + except Exception as e: + LOGGER.error(f" Error with search : {e}") + sys.exit(1) \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/database-server/docker-compose.yml b/examples/milvus-text-search-engine/server/database-server/docker-compose.yml new file mode 100644 index 0000000000..99cb7a90ae --- /dev/null +++ b/examples/milvus-text-search-engine/server/database-server/docker-compose.yml @@ -0,0 +1,49 @@ +version: '3.5' + +services: + etcd: + container_name: milvus-etcd + image: quay.io/coreos/etcd:v3.5.0 + environment: + - ETCD_AUTO_COMPACTION_MODE=revision + - ETCD_AUTO_COMPACTION_RETENTION=1000 + - ETCD_QUOTA_BACKEND_BYTES=4294967296 + - ETCD_SNAPSHOT_COUNT=50000 + volumes: + - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd + command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd + + minio: + container_name: milvus-minio + image: minio/minio:RELEASE.2022-03-17T06-34-49Z + environment: + MINIO_ACCESS_KEY: minioadmin + MINIO_SECRET_KEY: minioadmin + volumes: + - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data + command: minio server /minio_data + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] + interval: 30s + timeout: 20s + retries: 3 + + standalone: + container_name: milvus-standalone + image: milvusdb/milvus:v2.1.4 + command: ["milvus", "run", "standalone"] + environment: + ETCD_ENDPOINTS: etcd:2379 + MINIO_ADDRESS: minio:9000 + volumes: + - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus + ports: + - "19530:19530" + - "9091:9091" + depends_on: + - "etcd" + - "minio" + +networks: + default: + name: milvus diff --git a/examples/milvus-text-search-engine/server/deepsparse-requirements.txt b/examples/milvus-text-search-engine/server/deepsparse-requirements.txt new file mode 100644 index 0000000000..131cbfdda0 --- /dev/null +++ b/examples/milvus-text-search-engine/server/deepsparse-requirements.txt @@ -0,0 +1,2 @@ +deepsparse-nightly[server] +onnxruntime \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/deepsparse-server/server-config-deepsparse.yaml b/examples/milvus-text-search-engine/server/deepsparse-server/server-config-deepsparse.yaml new file mode 100644 index 0000000000..e5a31e9d09 --- /dev/null +++ b/examples/milvus-text-search-engine/server/deepsparse-server/server-config-deepsparse.yaml @@ -0,0 +1,13 @@ +num_workers: 4 + +endpoints: + - task: embedding_extraction + model: zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni + route: /predict + name: embedding_extraction_pipeline + batch_size: 1 + kwargs: + return_numpy: False + extraction_strategy: reduce_mean + sequence_length: 512 + engine_type: deepsparse \ No newline at end of file diff --git a/examples/milvus-text-search-engine/server/deepsparse-server/server-config-onnx.yaml b/examples/milvus-text-search-engine/server/deepsparse-server/server-config-onnx.yaml new file mode 100644 index 0000000000..cc2e4d59a4 --- /dev/null +++ b/examples/milvus-text-search-engine/server/deepsparse-server/server-config-onnx.yaml @@ -0,0 +1,13 @@ +num_workers: 4 + +endpoints: + - task: embedding_extraction + model: zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni + route: /predict + name: embedding_extraction_pipeline + batch_size: 1 + kwargs: + return_numpy: False + extraction_strategy: reduce_mean + sequence_length: 512 + engine_type: onnxruntime