This is a benchmark test to ensure that data engineers can show a good understanding of the fundamentals of reading, coding and delivering to a timeframe.
A link to a public Git repository with your final solution must be provided within 48 hours of receipt of the test. Please let your talent partner know if you need additional time. Guidelines
- To help understand how you approach the problem, we will assess your use of source control and how you build to the final solution, checking what is committed along each step (hint: frequent push)
- The code must be written in Python 3.
- You may use any frameworks or libraries to complete this task, excluding data analysis libraries like Pandas.
- Unit tests must be provided
A data file will be provided alongside this test. The dataset is a CSV which contains publicly available data about New York Police Dept. (NYPD) Arrest Data in 2018 first 6 six months.
- Load the data file, process and output the data in the forms specified
- Read in, process and present the data as specified in the requirements section
- Demonstrate usage of list comprehension for at least one of the tasks
- Allow user input to run all of your script, or specific sections
- Read in the attached file
- Produce a dictionary count records group by
OFNS_DESC
in descending order - Obtain the first 10 items from the resultant list and output to the console
-
Obtain the count of arrests grouped by age group and PD_CD. Find the 4th greatest number of arrests by PD_CD for each age group and output to the console.
-
Export to a csv file containing user specified
OFNS_DESC
. For example, a user can specify full or part of an offence - 'ASSAULT' or 'ASSAULT 3' or 'ASSAULT 3 & RELATED'. Export the result to a csv file. -
Instantiate a sqlite db and insert all records from the original csv into it.
-
Using the sqlite db you just created, answer question 2 again, this time only making use of sql.
Your code will be reviewed and assessed according to the following:
- Adherence to the requirements
- Code quality – readability, structure of the code, performance
- Unit test coverage and relevance of the tests
If you struggle completing the test or have concerns over certain aspects that is okay – just highlight it to us when you submit your test. Explain what you couldn't get working and steps you took to solve the problem. Whilst we want to see the completed task it is just as important for us to see how you approached an issue and attempted to find a solution. Do not overthink your solution. Keep it simple and use what you know. Write tests only for your code. Don't forget the ReadMe. Avoid creating additional requirements.