GitHub - piorot/athletes-activity-processor: Library that processes data from professional athletes' sports computers

DataProcessor Library Overview

This library processes real-world athlete activity data (summary, laps, samples) with a strong focus on data consistency and error handling. The design is flexible and gives users detailed feedback if data is inconsistent or invalid.

Why Inject Multiple Validation Services?

The library uses four validation services (one for each data type: summary, laps, samples, and cross-validation) to handle different aspects of the data.

Here's why it's a good approach:

Separation of Concerns: Each part of the data is validated separately, keeping the logic clear and maintainable.
User Customization: You can adjust validation rules for different use cases. For example, you might want to customize the threshold for max heart rate or decide how to handle non-parsable heart rate readings (e.g., replacing NaN with 0 or discarding them).
Cross-Validation: Besides validating data individually, cross-validation ensures that relationships between the summary, laps, and samples make sense. This validation can be customized, so you're free to apply your own consistency rules.

Flexibility with Real Data

This approach gives you the flexibility to work with real-world, often messy, data. You can plug in different validation strategies depending on your requirements—whether that’s stricter thresholds or specific error handling strategies.

Key Features

CI/CD Pipeline: A GitHub Actions CI pipeline ensures that all commits are automatically tested, making the project stable and reliable.
High Code Coverage: Integrated with Coveralls, code coverage is measured for every build and displayed via a badge.
Strong Typing: Everything is strongly typed with TypeScript, reducing errors and making it easier for developers to use the library confidently.
Custom Jest Matcher: I’ve added a custom Jest matcher for clearer, more readable tests. expect(heartRateSamples).toMatchHeartRateSeries(116, 117, 118);

This design gives flexibility while ensuring robustness, making it a powerful tool for processing real-world athlete data.

Solution status

Implement your solution using one of the following general-purpose programming languages: Java, Kotlin, C#, Python, JavaScript, or TypeScript. JavaScript or TypeScript is preferred.
Ensure that your solution achieves at least 80% test coverage.
We value both enterprise-level robustness and simplicity in your code, so please strive to balance these aspects.
Submit your solution by pushing it to a Git repository and sharing the link with us.

Task description:

You are tasked with developing a library that processes data from professional athletes' sports computers. The objective is to create a library capable of loading three types of input data, each through a separate method, and then processing this data. The final output should be a unified JSON file that consolidates all three datasets, which will then be used by the science team for further analysis.

Data Set Characteristics:

Summary: A basic activity summary including type, average values, etc.
Laps: Detailed descriptions of laps, including time, distance, and duration.
Samples: A collection of detailed recorded values grouped by sample types.

Output JSON Requirements:

The resulting JSON should include the following:

Activity Overview: Key details such as userId, type, device, max heart rate, and duration.
Laps Data: For each lap, include start time, distance, duration, and detailed heart rate samples. Heart rate samples should be presented as an array of objects containing two keys: sample index and heart rate.

Sample Processing Guidelines:

Heart rate samples are identified as type 2.
For each lap of type INDOOR_CYCLING, there are two consecutive objects of samples in the samples array.

Technical Requirements:

Implement your solution using one of the following general-purpose programming languages: Java, Kotlin, C#, Python, JavaScript, or TypeScript. JavaScript or TypeScript is preferred.
Ensure that your solution achieves at least 80% test coverage.
We value both enterprise-level robustness and simplicity in your code, so please strive to balance these aspects.
Submit your solution by pushing it to a Git repository and sharing the link with us.

Bonus points:

Your goal is to design, implement, test and document a methodology for pre-processing and modelling of the heart rate measurements within a lap. The pre-processing part should cover outlier identification and cleaning. The initial recording rate is set to 5, whereas each observation is a median aggregate of the 5 tick heart rate measurments. You need to reverse the aggregation step and backward interpolate the observations in a way, that you end up with 5 * (n-1) heart rate measurements with the corresponding recording rate of 1, where n denotes the initial number of observations. The modelling part should cover both model training and testing steps. The model should do well with predicting the median of the next five consecutive heart rate tick values. Elaborate on the error metric you chose for the model validation.

Summary

  {
  "userId": "1234567890",
  "activityId": 9480958402,
  "activityName": "Indoor Cycling",
  "durationInSeconds": 3667,
  "startTimeInSeconds": 1661158927,
  "startTimeOffsetInSeconds": 7200,
  "activityType": "INDOOR_CYCLING",
  "averageHeartRateInBeatsPerMinute": 150,
  "activeKilocalories": 561,
  "deviceName": "instinct2",
  "maxHeartRateInBeatsPerMinute": 190
}

Samples data

[
  {
    "recording-rate": 5,
    "sample-type": "0",
    "data": "86,87,88,88,88,90,91"
  },
  {
    "recording-rate": 5,
    "sample-type": "2",
    "data": "120,126,122,140,142,155,145"
  },
  {
    "recording-rate": 5,
    "sample-type": "2",
    "data": "141,147,155,160,180,152,120"
  },
  {
    "recording-rate": 5,
    "sample-type": "0",
    "data": "86,87,88,88,88,90,91"
  },
  {
    "recording-rate": 5,
    "sample-type": "1",
    "data": "143,87,88,88,88,90,91"
  },
  {
    "recording-rate": 5,
    "sample-type": "2",
    "data": "143,151,164,null,173,181,180"
  },
  {
    "recording-rate": 5,
    "sample-type": "2",
    "data": "182,170,188,181,174,172,158"
  }
    {
    "recording-rate": 5,
    "sample-type": "3",
    "data": "143,87,88,88,88,90,91"
  },
]

Laps

 [
  {
    "startTimeInSeconds": 1661158927,
    "airTemperatureCelsius": 28,
    "heartRate": 109,
    "totalDistanceInMeters": 15,
    "timerDurationInSeconds": 600
  },
  {
    "startTimeInSeconds": 1661158929,
    "airTemperatureCelsius": 28,
    "heartRate": 107,
    "totalDistanceInMeters": 30,
    "timerDurationInSeconds": 900
  }
]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
jest.config.ts		jest.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataProcessor Library Overview

Why Inject Multiple Validation Services?

Here's why it's a good approach:

Flexibility with Real Data

Key Features

Solution status

Task description:

Data Set Characteristics:

Output JSON Requirements:

Technical Requirements:

Bonus points:

About

Releases

Packages

Languages

piorot/athletes-activity-processor

Folders and files

Latest commit

History

Repository files navigation

DataProcessor Library Overview

Why Inject Multiple Validation Services?

Here's why it's a good approach:

Flexibility with Real Data

Key Features

Solution status

Task description:

Data Set Characteristics:

Output JSON Requirements:

Technical Requirements:

Bonus points:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages