This library processes real-world athlete activity data (summary, laps, samples) with a strong focus on data consistency and error handling. The design is flexible and gives users detailed feedback if data is inconsistent or invalid.
The library uses four validation services (one for each data type: summary, laps, samples, and cross-validation) to handle different aspects of the data.
- Separation of Concerns: Each part of the data is validated separately, keeping the logic clear and maintainable.
- User Customization: You can adjust validation rules for different use cases. For example, you might want to customize the threshold for max heart rate or decide how to handle non-parsable heart rate readings (e.g., replacing
NaN
with 0 or discarding them). - Cross-Validation: Besides validating data individually, cross-validation ensures that relationships between the summary, laps, and samples make sense. This validation can be customized, so you're free to apply your own consistency rules.
This approach gives you the flexibility to work with real-world, often messy, data. You can plug in different validation strategies depending on your requirements—whether that’s stricter thresholds or specific error handling strategies.
- CI/CD Pipeline: A GitHub Actions CI pipeline ensures that all commits are automatically tested, making the project stable and reliable.
- High Code Coverage: Integrated with Coveralls, code coverage is measured for every build and displayed via a badge.
- Strong Typing: Everything is strongly typed with TypeScript, reducing errors and making it easier for developers to use the library confidently.
- Custom Jest Matcher: I’ve added a custom Jest matcher for clearer, more readable tests.
expect(heartRateSamples).toMatchHeartRateSeries(116, 117, 118);
This design gives flexibility while ensuring robustness, making it a powerful tool for processing real-world athlete data.
- Implement your solution using one of the following general-purpose programming languages: Java, Kotlin, C#, Python, JavaScript, or TypeScript. JavaScript or TypeScript is preferred.
- Ensure that your solution achieves at least 80% test coverage.
- We value both enterprise-level robustness and simplicity in your code, so please strive to balance these aspects.
- Submit your solution by pushing it to a Git repository and sharing the link with us.
You are tasked with developing a library that processes data from professional athletes' sports computers. The objective is to create a library capable of loading three types of input data, each through a separate method, and then processing this data. The final output should be a unified JSON file that consolidates all three datasets, which will then be used by the science team for further analysis.
- Summary: A basic activity summary including type, average values, etc.
- Laps: Detailed descriptions of laps, including time, distance, and duration.
- Samples: A collection of detailed recorded values grouped by sample types.
The resulting JSON should include the following:
- Activity Overview: Key details such as userId, type, device, max heart rate, and duration.
- Laps Data: For each lap, include start time, distance, duration, and detailed heart rate samples. Heart rate samples should be presented as an array of objects containing two keys: sample index and heart rate.
Sample Processing Guidelines:
- Heart rate samples are identified as type 2.
- For each lap of type INDOOR_CYCLING, there are two consecutive objects of samples in the samples array.
- Implement your solution using one of the following general-purpose programming languages: Java, Kotlin, C#, Python, JavaScript, or TypeScript. JavaScript or TypeScript is preferred.
- Ensure that your solution achieves at least 80% test coverage.
- We value both enterprise-level robustness and simplicity in your code, so please strive to balance these aspects.
- Submit your solution by pushing it to a Git repository and sharing the link with us.
Your goal is to design, implement, test and document a methodology for pre-processing and modelling of the heart rate measurements within a lap. The pre-processing part should cover outlier identification and cleaning. The initial recording rate is set to 5
, whereas each observation is a median aggregate of the 5 tick heart rate measurments. You need to reverse the aggregation step and backward interpolate the observations in a way, that you end up with 5 * (n-1) heart rate measurements with the corresponding recording rate of 1
, where n denotes the initial number of observations. The modelling part should cover both model training and testing steps. The model should do well with predicting the median of the next five consecutive heart rate tick values. Elaborate on the error metric you chose for the model validation.
Summary
{
"userId": "1234567890",
"activityId": 9480958402,
"activityName": "Indoor Cycling",
"durationInSeconds": 3667,
"startTimeInSeconds": 1661158927,
"startTimeOffsetInSeconds": 7200,
"activityType": "INDOOR_CYCLING",
"averageHeartRateInBeatsPerMinute": 150,
"activeKilocalories": 561,
"deviceName": "instinct2",
"maxHeartRateInBeatsPerMinute": 190
}
Samples data
[
{
"recording-rate": 5,
"sample-type": "0",
"data": "86,87,88,88,88,90,91"
},
{
"recording-rate": 5,
"sample-type": "2",
"data": "120,126,122,140,142,155,145"
},
{
"recording-rate": 5,
"sample-type": "2",
"data": "141,147,155,160,180,152,120"
},
{
"recording-rate": 5,
"sample-type": "0",
"data": "86,87,88,88,88,90,91"
},
{
"recording-rate": 5,
"sample-type": "1",
"data": "143,87,88,88,88,90,91"
},
{
"recording-rate": 5,
"sample-type": "2",
"data": "143,151,164,null,173,181,180"
},
{
"recording-rate": 5,
"sample-type": "2",
"data": "182,170,188,181,174,172,158"
}
{
"recording-rate": 5,
"sample-type": "3",
"data": "143,87,88,88,88,90,91"
},
]
Laps
[
{
"startTimeInSeconds": 1661158927,
"airTemperatureCelsius": 28,
"heartRate": 109,
"totalDistanceInMeters": 15,
"timerDurationInSeconds": 600
},
{
"startTimeInSeconds": 1661158929,
"airTemperatureCelsius": 28,
"heartRate": 107,
"totalDistanceInMeters": 30,
"timerDurationInSeconds": 900
}
]