Update README file to indicate new features and available options

ydataai · Jan 6, 2018 · 65d6db2 · 65d6db2
1 parent 6a4a810
commit 65d6db2
Showing 1 changed file with 16 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # pandas-profiling 
 
-Generates profile reports from a pandas DataFrame. The *df.describe()* function is great but a little basic for serious exploratory data analysis. 
+Generates profile reports from a pandas `DataFrame`. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis.
 
 For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
 
@@ -9,6 +9,7 @@ For each column the following statistics - if relevant for the column type - are
 * **Descriptive statistics** like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
 * **Most frequent values**
 * **Histogram**
+* **Correlations** highlighting of highly correlated variables, Spearman and Pearson matrixes
 
 ## Demo
 
@@ -60,25 +61,35 @@ To retrieve the list of variables which are rejected due to high correlation:
     profile = pandas_profiling.ProfileReport(df)
     rejected_variables = profile.get_rejected_variables(threshold=0.9)
 
-If you want to generate a HTML report file, save the ProfileReport to an object and use the *to_file()* function:
+If you want to generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:
 
     profile = pandas_profiling.ProfileReport(df)
     profile.to_file(outputfile="/tmp/myoutputfile.html")
 
 ### Python
 
-For standard formatted CSV files that can be read immediately by pandas, you can use the **profile_csv.py** script. Run
+For standard formatted CSV files that can be read immediately by pandas, you can use the `profile_csv.py` script. Run
 
 	python profile_csv.py -h
 
 for information about options and arguments.
 
+### Advanced usage
+
+A set of options are available in order to adapt the report generated.
+
+* `bins` (`int`): Number of bins in histogram (10 by default).
+* Correlation settings:
+    * `check_correlation` (`boolean`): Whether or not to check correlation (`True` by default)
+    * `correlation_threshold` (`float`): Threshold to determine if the variable pair is correlated (0.9 by default).
+    * `correlation_overrides` (`list`): Variable names not to be rejected because they are correlated (`None` by default).
+    * `check_recoded` (`boolean`): Whether or not to check recoded correlation (`False` by default). Since it's an expensive computation it can be activated for small datasets.
+* `pool_size` (`int`): Number of workers in thread pool. The default is equal to the number of CPU.
+
 ## Dependencies
 
 * **An internet connection.** Pandas-profiling requires an internet connection to download the Bootstrap and JQuery libraries. I might change this in the future, let me know if you want that sooner than later.
 * python (>= 2.7)
 * pandas (>=0.19)
 * matplotlib  (>=1.4)
 * six (>=1.9)
-
-