“The 1.0 release does not mean a conclusion, or even slowing down, of pandas’ development.”

pandas 1.0.0 has been released on GitHub by Tom Augspurger, following a release candidate that arrived earlier this month. We interviewed Tom to find out more about the major upgrade to v0.25.3.

Let’s see what he has to say about the new NA scalar, breaking changes and the upgrade process, and what features he is looking forward to implementing in the future.

SEE ALSO: CodinGame 2020 developer survey says Python is the most loved programming language

JAXenter: You’ve made the decision to finally push pandas to version 1.0. What considerations played a part in this?

pandas.NA is a new concept in the scientific Python ecosystem, and it’s not clear how other libraries will adapt to handle it.

Tom: We started thinking about pandas 1.0 in earnest at our first developer sprint in July 2018. At the time, we optimistically targeted January 2019 (6 months from the sprint). We ended up needing another 18 months of development.

pandas has been “production ready” for a while now, in the sense that it’s used in production at many institutions. But we still had a few major items we wanted to iron out before calling 1.0:

Clean up the API. We’d accumulated a large amount of deprecations for duplicative or unclear behavior. Many of these were enforced for 1.0.
Stabilize the data model. Starting around pandas 0.23 (May 2018), we clarified exactly what kind of data could be stored in a Series or DataFrame. Historically, this was just NumPy arrays or a few “extension types” that pandas defined. The 0.23 release included an interface that specified what kind of array can be stored inside pandas, and over the subsequent releases we refined that interface.

With these stabilizations in place, we felt that a 1.0 was appropriate.

JAXenter: What is your personal highlight in pandas 1.0?

Tom: The new NA scalar to represent scalar missing values. This is the value used to represent “missing” in our new nullable integer, boolean, and string data types. Historically, we’ve used NaN (not a number), but that had several drawbacks. Most notably, NaN is a float and so cannot be used with integer dtypes. And NaN has some peculiar behavior in logical and comparison operations.

Historically, we’ve used NaN (not a number), but that had several drawbacks.

pandas.NA is a new concept in the scientific Python ecosystem, and it’s not clear how other libraries will adapt to handle it. We’re working with other libraries, including NumPy, to discover how we can best handle the concept of “missing data” across the ecosystem.

JAXenter: For developers who use pandas, what will be the most significant changes when upgrading?

Tom: All of our API breaking changes are documented in our release notes. This release had relatively minor breaking changes. The largest changes are probably to the (experimental) IntegerArray to now use the new pandas.NA scalar value rather than NaN. When upgrading, we always recommend:

A careful read through of the release notes.
Trying the release candidate.

We provide binaries for final releases and release candidates. Subscribe to our releases on GitHub by “watching” for releases.

Our full installation instructions are available here.

“The 1.0 release does not mean a conclusion, or even slowing down, of pandas’ development.”

SEE ALSO: CodinGame 2020 developer survey says Python is the most loved programming language

ML Conference – The Conference
for Machine Learning Innovation

Applying Machine Learning online at Scale

Predictive Maintenance – how does Data Science revolutionize the World of Machines?

Scaling Machine Learning Systems up to Billions of Predictions per Day

SEE ALSO: Predicting 2020 and beyond: Real time is out, predicting the future is in

You may also like...

Random Post

Recent

“The 1.0 release does not mean a conclusion, or even slowing down, of pandas’ development.”

SEE ALSO: CodinGame 2020 developer survey says Python is the most loved programming language

ML Conference – The Conference for Machine Learning Innovation

Applying Machine Learning online at Scale

Predictive Maintenance – how does Data Science revolutionize the World of Machines?

Scaling Machine Learning Systems up to Billions of Predictions per Day

SEE ALSO: Predicting 2020 and beyond: Real time is out, predicting the future is in

You may also like...

Work and Play in Girls’ Schools by Dorothea Beale

Tibetan Tales

Google Accidentally Announced the Pixel Buds A-Series on Twitter

Random Post

Recent

Tags

ML Conference – The Conference
for Machine Learning Innovation