PyValentin is a sophisticated matchmaking system that uses multi-dimensional distance calculations, compatibility filtering, and grade-based matching to create optimal pairs from survey responses. The system processes raw survey data through several stages, applying mathematical models to quantify compatibility.
- Features
- Setup
- Usage
- The Beautiful Beautifly Simple Math Behind PyValentin
- File Structure
- Configuration
- Input Format
- Output Files
- Customization Guide
- Technical Details
- Troubleshooting
- Contact & Support
- License
- Multi-dimensional compatibility analysis
- Grade-based matching with configurable weights
- Customizable gender/preference filtering
- Quality vs. quantity optimization
- Grade difference consideration
- Multiple matching algorithms (Greedy and Hungarian)
- Interactive GUI with progress tracking
- Drag-and-drop file support
- Comprehensive results output
- Automatic dependency management
- Python 3.8+
- Required packages (automatically installed):
tkinterdnd2 numpy scipy
- Download the latest .ZIP release
- Run
python core/update_dependencies.py
to check and install dependencies - Configure your settings files:
- Config.json (response mappings)
- Filter.json (preference rules)
- defaults.json (default file paths)
- Launch the application:
python main.py
- Select required files:
- Survey responses (CSV)
- Configuration file (JSON)
- Filter rules (JSON)
- Grade data (CSV)
- Adjust sliders:
- Quality-quantity balance
- Grade weight importance
- Click "Process Files"
- Check the genR folder for results
Before processing, all categorical survey responses are converted to numerical values using the Config.json mapping. This creates a consistent numerical space for calculations.
For each pair of users (i,j), we calculate a multi-dimensional Euclidean distance:
distance(i,j) = √(Σ(xi,k - xj,k)²)
where k represents each survey question
This produces a distance matrix D where D[i,j] represents how different two users are across all responses.
The distance matrix is transformed into a similarity matrix using:
similarity(i,j) = 1 / (1 + distance(i,j))
This creates a normalized similarity score where:
- 1.0 = perfect match
- 0.0 = complete mismatch
The system applies a boolean matrix F where:
F[i,j] = 1 if preferences match
F[i,j] = 0 if preferences conflict
The final compatibility score becomes:
compatibility(i,j) = similarity(i,j) × F[i,j]
The system uses a modified stable marriage algorithm with the following steps:
- Sort users by number of potential matches
- For each user i:
- Find top N matches based on quality weight
- Select best available match j
- Remove both i and j from available pool
Quality weight affects match selection:
- High (>0.5): Selects from top 25% of matches
- Low (<0.5): Considers up to 75% of matches
For remaining unmatched users:
- Create a graph of mutual matches
- Find maximal matching using a greedy algorithm
- Optimize for global satisfaction using local improvements
The system incorporates grade differences into the matching process:
final_score = (1 - grade_weight) * compatibility_score + grade_weight * (1 - grade_penalty)
where:
grade_penalty = {
0: 0.0, # Same grade
1: 0.3, # One grade difference
2: 0.7, # Two grades difference
3: 0.9 # Three+ grades difference
}
The grade_weight slider (0.0-1.0) determines the importance of grade matching:
- 0.0: Ignore grades entirely
- 0.7: Recommended balance (default)
- 1.0: Prioritize grade matching above all else
PyValentin/
├── main.py # Main application
├── FixCSV.py # Data preprocessing
├── Ski.py # Core algorithms
├── PyValentin.py # Improved UI
├── genR/ # Generated results
├── ASF Specific/ # Configuration files
└── README.md # Documentation
Defines mappings from survey responses to numerical values:
{
"Response Text": "Numerical Value",
}
Defines preference matching rules:
{
"filterables": {
"1": "Male",
"2": "Female",
},
"filters": {
"5": ["1", "2", "3"],
"4": ["1", "2"],
}
}
Required CSV columns:
- Timestamp
- Gender (a)
- Attracted to (b)
- Question responses...
Example:
Timestamp,Email,Gender,Attracted To,Q1,Q2,...
2024-01-01,[email protected],Male,Female,Response1,Response2,...
- modified_csv.csv: Normalized survey data
- processed_distances.csv: Distance matrix
- similarity_list.csv: Similarity scores
- filtered_similarity_list.csv: Filtered matches
- optimal_pairs_greed.csv: Greedy algorithm pairs
- optimal_pairs_gluttony.csv: Hungarian algorithm pairs
- optimal_pairs_with_info_greed.csv: Detailed greedy matches
- optimal_pairs_with_info_gluttony.csv: Detailed Hungarian matches
- unpaired_entries_greed.csv: Unmatched users (greedy)
- unpaired_entries_gluttony.csv: Unmatched users (Hungarian)
- Update
Config.json
with new response mappings - Modify
Filter.JSON
matching rules - Adjust weights in
Ski.py
calculate_distances() function
- Add response mappings to
Config.json
- Update CSV processing in
FixCSV.py
if needed - Modify distance calculation in
Ski.py
Modify Filter.JSON
:
{
"filterables": {
"value": "label"
},
"filters": {
"seeker_value": ["acceptable_values"]
}
}
- Uses normalized Euclidean distance
- Configurable weights per question
- Range: 0 (identical) to 1 (maximum difference)
- Converts responses to numerical values
- Calculates distance matrix
- Generates similarity scores
- Applies filtering rules
- Handles edge cases
- O(n²) complexity for n participants
- Memory usage: ~100MB for 1000 participants
- Processing time: ~1-2 seconds per 100 participants
-
Missing dependencies
- Run
pip install tkinterdnd2 numpy
- Check Python version (3.8+ required)
- Run
-
File format errors
- Verify CSV column order
- Check JSON syntax
- Ensure UTF-8 encoding
-
No matches found
- Verify filter rules
- Check response mappings
- Confirm gender/attraction values
Add debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Contact me for info on this project at: Nagusame CS on Github
This project is licensed under the GNU General Public License v3.0 - see below for details:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.