modin.pandas.DataFrame.nsmallest

DataFrame.nsmallest(n, columns, keep='first') DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.26.0/snowpark-python/.tox/docs/lib/python3.9/site-packages/modin/pandas/dataframe.py#L1449-L1459)

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n)

Parameters:
  • n (int) – Number of items to retrieve.

  • columns (list or str) – Column name or names to order by.

  • keep ({'first', 'last', 'all'}, default 'first') –

    Where there are duplicate values:

    • first : take the first occurrence.

    • last : take the last occurrence.

    • all : keep all the ties of the largest item even if it means selecting more than n items.

Return type:

DataFrame

See also

DataFrame.nlargest

Return the first n rows ordered by columns in descending order.

DataFrame.sort_values

Sort DataFrame by the values.

DataFrame.head

Return the first n rows without re-ordering.

Examples

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI
Copy

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

>>> df.nsmallest(3, 'population')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Copy

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR
Copy

When using keep='all', the number of element kept can go beyond n. if there are duplicate values for the largest element, all the ties are kept.

>>> df.nsmallest(3, 'population', keep='all')  
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR
Copy

However, nsmallest does not keep n distinct smallest elements:

>>> df.nsmallest(4, 'population', keep='all')  
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR
Copy

To order by the smallest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nsmallest(3, ['population', 'GDP'])
          population  GDP alpha-2
Tuvalu         11300   38      TV
Anguilla       11300  311      AI
Nauru         337000  182      NR
Copy
Language: English