Hotels Analysis

Input:

Json file about hotels in India with which has 9 fields and 305 rows.Rows are subjecct to change and the code must scale to size of data. The fields are as below:

HOTEL NAME
ADDRESS
STATE
PHONE
FAX
EMAIL ID
WEBSITE
TYPE
Rooms

Functionality:

The solution would process the json file and form a pandas dataframe by extracting the data. From the dataframe, hotels with maximum of rooms under each type would be computed. Below are possible approaches:

Naive approach is to fit entire data into the memory and do the computation.
Increase the number of machines and process the data by distributing among them.
Divide the data into chunks which could be fit into memory and process the data sequentially.

Submitted solution contains first and third approach. First approach is in the notebook hotels_analysis.ipynb and second approach is in the notebook hotels_analysis_large_files.ipynb

Output:

The computed output dataframe is stored as csv file. result.csv and result_1.csv files are output files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hotels Analysis

Input:

Functionality:

Output:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hotels Analysis

Input:

Functionality:

Output: