TL;DR: pandas.read_csv considers the word “null” as a NaN, which also means “zero” in German. The arguments na_values and keep_default_na offer a solution.
It’s Friday an you set out to build a very sophisticated numbers translator in several languages:
If you want to know how to say 3 in Spanish, you do:
numbers.loc[3, "Spanish"]
'tres'
Nice. You save the super advanced translator for later use and go off for a relaxed weekend.
numbers.to_csv("numbers")
Back to work on Monday, your German friend drops by your office and you want to proudly show what you’ve created. So you load your “translator” and go like: – Ask me how to say any number! – OK. Let’s start easy: how do you say zero in German?
“That I can do”, you think and type:
# Load the awesome translatornumbers_loaded = pd.read_csv("numbers", index_col="Number")# And get the translationnumbers_loaded.loc[0, "German"]
nan
Oh no, that’s no good! You get out of the embarrassing situation saying it is actually a beta version, and, and, and. The harm is done and your friend leaves the office skeptical – to say the least.
What’s was the problem? The answer is in the docstrings of the function pd.read_csv. If we look carefully at which values pandas considers as NaN per default we find the following:
"""na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'."""
There it is: “null” is in the list!
Solution: We need to do two things: - Pass other values without “null” (and “NULL” if you’re not sure everything is lowercase). - Tell pandas not to keep the defaults (otherwise it will use both the defaults and the passed values).
Any bugs, questions, comments, suggestions? Ping me on twitter or drop me an e-mail (fabridamicelli at gmail).
Share this article on your favourite platform: