Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCA having problems with pandas CategoricalIndex #11

Open
omrihar opened this issue Nov 3, 2016 · 1 comment
Open

MCA having problems with pandas CategoricalIndex #11

omrihar opened this issue Nov 3, 2016 · 1 comment

Comments

@omrihar
Copy link

omrihar commented Nov 3, 2016

It seems that, ironically, mca has some problems with handling a categorical index. I have such an index in my DataFrame which is created using pd.qcut(). When I run mca on this data I get an error:

AttributeError: 'Int64Index' object has no attribute '_is_dtype_compat'

I can work around this problem by using pd.to_numeric() on the offending column and then everything works. A minimal example that does not exactly yield the same error, but still does have a problem with the categorical index:

import numpy as np
import mca

a = np.random.normal(size=20)
aa = pd.qcut(a, 4, labels=list("1234"))
b = list(range(20))
c = list('abcdefghijklmnopqrst')
df = pd.DataFrame([aa,b,c], index=list(range(20)), columns=list('abc'))
# df.a = pd.to_numeric(df.a, downcast='integer')   # <- this fixes the problem
mymca = mca.mca(df, cols=['a','b'])

Running the above code yields the following error:

TypeError: cannot append a non-category item to a CategoricalIndex

Uncommenting the commented line fixes the problem. Hope this helps :-)

Cheers,
Omri

@esafak
Copy link
Owner

esafak commented Dec 22, 2016

Thanks for reporting! I can reproduce the problem and am open to suggestions about how to solve it. It seems to be caused by a categorical first column, so checking for that and casting it to numeric should solve the problem. For example, these similar dataframes with re-assigned or re-ordered categorical columns don't have the same problem:

df.astype({'a':int, 'c':'category'})
df.assign(d=df.a).drop('a',1)

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants