PRegress is a Python package for regression analysis and data visualization. It provides tools for model fitting, prediction, and various types of plots to help visualize your data particularly for regression analysis.
- Model fitting and prediction with convenient formula notation
- Streamlined code for plotting (boxplot, histogram, scatter plot, etc.)
- Regression analysis diagnostic tools
- Integration with popular libraries like
pandas
andstatsmodels
You can install the package using pip
:
pip install pregress
Here are some examples of how to use the key functions in the package.
To use the functions provided by the package, import it as follows:
import pregress as pr
There are multiple datasets available from PRegress and are easily attained using the get_data
function. The datasets currently available are:
- AirBnb.csv
- Betas.csv
- Charges.csv
- Employment.csv
- HousePrices.csv
- HR_retention.csv
- MarketingToys.csv
- Sales.csv
- Top200.csv
- Twitter.csv
- Youtube.csv
See Applied Linear Regression for Business Analytics with Python for details regarding these datasets. Sample import example:
import pregress as pr
# Load data from PRegress
df = pr.get_data("Betas.csv")
# Format the data (for later)
df.drop(columns = df.columns[0], inplace=True)
PRegress formula supports formula functionality similar to R. Fit a model with a formula:
# Fit model with formula
model = pr.fit("SPY ~ .", df)
Summary types are specified using the out
argument. Different summaries are available including:
- statsmodels (default)
- R
- STATA
- simple
- ANOVA
- coefficients (coef)
# Generate a model summary
pr.summary(model)
A Statsmodels object is created by default. From this object, the predict function can be used. Since df
is the dataframe used to fit the model, the following lines produce the same result.
# Make predictions
pr.predict(model, df)
# Produce fitted values
pr.predict(model)
Plotting code is streamlined and built on top of Seaborn and MatPlotLib. Samples provided below.
# Generate a boxplot
pr.boxplot("SPY ~ .", df)
# Generate a histogram
pr.hist(df.SPY)
# Multiple histograms
pr.hists("SPY ~ .",data = df)
# Scatter plot
pr.plotXY("MSFT ~ SPY", data = df)
# Multiple Scatter plots
pr.plots("SPY ~ .", data = df)
# Correlation Plot
pr.plot_cor(df)
Based on current testing, the following fixes are required:
- Ensure global scope accessibility for variables.
- Adjust summary spacing.
- Provide compatibility with
scikit-learn
. - Implement AI-generated summaries.
- Allow for additional plotting customization (using kwargs).
- Review and improve diagnostic plots.
- Provide support for logistic regression and other GLMs.
- Provide support for automatic dummy variable retrieval.
- Plots should work without formulas.
We welcome contributions to PRegress! If you find a bug or have a feature request, please open an issue on GitHub. You can also contribute by:
- Forking the repository
- Creating a new branch (
git checkout -b feature-branch
) - Committing your changes (
git commit -am 'Add some feature'
) - Pushing to the branch (
git push origin feature-branch
) - Creating a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank all contributors and users of PRegress for their support and feedback. Special thanks to Mintra Putlek!