Not On the High Street Data Engineering Code Challenge

This is a benchmark test to ensure that data engineers can show a good understanding of the fundamentals of reading, coding and delivering to a timeframe.

Rules

A link to a public Git repository with your final solution must be provided within 48 hours of receipt of the test. Please let your talent partner know if you need additional time. Guidelines

To help understand how you approach the problem, we will assess your use of source control and how you build to the final solution, checking what is committed along each step (hint: frequent push)
The code must be written in Python 3.
You may use any frameworks or libraries to complete this task, excluding data analysis libraries like Pandas.
Unit tests must be provided

Objectives

A data file will be provided alongside this test. The dataset is a CSV which contains publicly available data about New York Police Dept. (NYPD) Arrest Data in 2018 first 6 six months.

Actions

Load the data file, process and output the data in the forms specified
Read in, process and present the data as specified in the requirements section
Demonstrate usage of list comprehension for at least one of the tasks
Allow user input to run all of your script, or specific sections

Requirements

Read in the attached file

Produce a dictionary count records group by OFNS_DESC in descending order
Obtain the first 10 items from the resultant list and output to the console

Obtain the count of arrests grouped by age group and PD_CD. Find the 4th greatest number of arrests by PD_CD for each age group and output to the console.
Export to a csv file containing user specified OFNS_DESC. For example, a user can specify full or part of an offence - 'ASSAULT' or 'ASSAULT 3' or 'ASSAULT 3 & RELATED'. Export the result to a csv file.
Instantiate a sqlite db and insert all records from the original csv into it.
Using the sqlite db you just created, answer question 2 again, this time only making use of sql.

Assessment

Your code will be reviewed and assessed according to the following:

Adherence to the requirements
Code quality – readability, structure of the code, performance
Unit test coverage and relevance of the tests

Helpful Tips

If you struggle completing the test or have concerns over certain aspects that is okay – just highlight it to us when you submit your test. Explain what you couldn't get working and steps you took to solve the problem. Whilst we want to see the completed task it is just as important for us to see how you approached an issue and attempted to find a solution. Do not overthink your solution. Keep it simple and use what you know. Write tests only for your code. Don't forget the ReadMe. Avoid creating additional requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
nypd-arrest-data-2018-1.csv		nypd-arrest-data-2018-1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Not On the High Street Data Engineering Code Challenge

Rules

Objectives

Actions

Requirements

Assessment

Helpful Tips

About

Releases

Packages

Contributors 3

notonthehighstreet/dataeng-challenge

Folders and files

Latest commit

History

Repository files navigation

Not On the High Street Data Engineering Code Challenge

Rules

Objectives

Actions

Requirements

Assessment

Helpful Tips

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages