Skip to content

Commit

Permalink
Merge branch 'develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin authored Jun 19, 2024
2 parents a2b01d1 + 50b21d4 commit 7d73d2c
Show file tree
Hide file tree
Showing 14 changed files with 2,335 additions and 10 deletions.
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,37 @@

## develop

### Fixes

- fix: fix support for `numpy==2.x` ([@ibevers](https://github.com/ibevers/))

## Version 3.3.0 (2024-06-14)

### TL;DR

`pyannote.audio` does [speech separation](https://hf.co/pyannote/speech-separation-ami-1.0): multi-speaker audio in, one audio channel per speaker out!

```bash
pip install pyannote.audio[separation]==3.3.0
```

### New features

- feat(task): add `PixIT` joint speaker diarization and speech separation task (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(model): add `ToTaToNet` joint speaker diarization and speech separation model (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(pipeline): add `SpeechSeparation` pipeline (with [@joonaskalda](https://github.com/joonaskalda/))
- feat(io): add option to select torchaudio `backend`

### Fixes

- fix(task): fix wrong train/development split when training with (some) meta-protocols ([#1709](https://github.com/pyannote/pyannote-audio/issues/1709))
- fix(task): fix metadata preparation with missing validation subset ([@clement-pages](https://github.com/clement-pages/))

### Improvements

- improve(io): when available, default to using `soundfile` backend
- improve(pipeline): do not extract embeddings when `max_speakers` is set to 1
- improve(pipeline): optimize memory usage of most pipelines ([#1713](https://github.com/pyannote/pyannote-audio/pull/1713) by [@benniekiss](https://github.com/benniekiss/))

## Version 3.2.0 (2024-05-08)

Expand Down
11 changes: 5 additions & 6 deletions pyannote/audio/core/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ def aggregate(
warm_up: Tuple[float, float] = (0.0, 0.0),
epsilon: float = 1e-12,
hamming: bool = False,
missing: float = np.NaN,
missing: float = np.nan,
skip_average: bool = False,
) -> SlidingWindowFeature:
"""Aggregation
Expand Down Expand Up @@ -559,9 +559,6 @@ def aggregate(
step=frames.step,
)

masks = 1 - np.isnan(scores)
scores.data = np.nan_to_num(scores.data, copy=True, nan=0.0)

# Hamming window used for overlap-add aggregation
hamming_window = (
np.hamming(num_frames_per_chunk).reshape(-1, 1)
Expand Down Expand Up @@ -613,11 +610,13 @@ def aggregate(
)

# loop on the scores of sliding chunks
for (chunk, score), (_, mask) in zip(scores, masks):
for chunk, score in scores:
# chunk ~ Segment
# score ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray
# mask ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray

mask = 1 - np.isnan(score)
np.nan_to_num(score, copy=False, nan=0.0)

start_frame = frames.closest_frame(chunk.start + 0.5 * frames.duration)

aggregated_output[start_frame : start_frame + num_frames_per_chunk] += (
Expand Down
4 changes: 3 additions & 1 deletion pyannote/audio/core/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,9 @@ def prepare_data(self):
prepared_data["metadata-labels"] = np.array(unique_labels, dtype=np.str_)
unique_labels.clear()

self.prepare_validation(prepared_data)
if self.has_validation:
self.prepare_validation(prepared_data)

self.post_prepare_data(prepared_data)

# save prepared data on the disk
Expand Down
Loading

0 comments on commit 7d73d2c

Please sign in to comment.