[Pandas] Finding a Row Where One of Its Values Is at a Minimum/Maximum

Often, we will want to get to get a specific row, which marks the minimum or maximum of one of its columns. Let’s suppose we have the SF Salaries dataset from Kaggle. We want to find the employee name, with the largest total pay benefits. The experience with writing NumPy/Pandas filter conditions will quickly let us produce the following version:

sal[sal['TotalPayBenefits'] == sal['TotalPayBenefits'].max()]['EmployeeName']

which is absolutely valid, but is it the only option? I could imagine that that in a large dataset, there would be quite a lot of comparison involved, plus the creation of a whole new data series (the filter condition). Is there perhaps a more performant one? How about trying out idmax() (or, for those coming from NumPy, argmax(), both do the same). By given a column, the function will return the index of the data frame, where it is at its highest.

sal.loc[sal['TotalPayBenefits'].idxmax()]['EmployeeName']

sal.loc[sal['TotalPayBenefits'].argmax()]['EmployeeName']

I did a quick performance check, and indeed, idmax/argmax turn out twice as fast:

# the original filter condition
815 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# idmax()
402 µs ± 7.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# argmax
404 µs ± 8.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Share on

Twitter Facebook Google+ LinkedIn

Designing Well-Structured REST APIs with Flask-RestPlus: Part 1

7 minute read

This is the first part of a two-part series. In this post, I will introduce Flask-RestPlus and demonstrate how to start organising APIs around its simple REST-based conventions. Next time, I will address the topic of request/response marshalling (serialisation) and validation.

KotlinConf 2018 Recap

4 minute read

KotlinConf 2018 took place in beautiful Amsterdam at the beginning of October, and this year, I had the chance to attend. Though only the second edition, KotlinConf has already become something of an institution within the developer community. Events like these help you meet the people whose work you admire, but would otherwise not be able to meet in person.

[Pandas] Finding a Row Where One of Its Values Is at a Minimum/Maximum

Preslav Rachev

Share on

Leave a comment

You may also enjoy

Designing Well-Structured REST APIs with Flask-RestPlus: Part 1

KotlinConf 2018 Recap

A New Stochastic Oscillator?

A Photo Recap of the Past Week