Sorting and Selecting Data

I. Sorting

1. Sorting by one variable

You can sort the rows by passing a column name to .sort_values().


For example


To sort in descending order, we add ascending=False

In this example, the grade will be the highest to the lowest:

students.sort_values("grade", ascending=False)

2. Sorting by multiple variables

We can sort by multiple variables by passing a list of column names to sort_values()

students.sort_values("grade", "age")

To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.

students.sort_values("grade", "age", ascending=[True, False])

II. Select columns

Columns can be used in calculation and plotting data.

1. Selecting with brackets and strings

We select with DataFrame-name['column']

name = records['name']

2. Selecting with a dot

If a column string only contains letters, numbers and underscores, we can use dot notation.


For example, with the DataFrame called students, we can select the name column with

Note in column selection

  • Use brackets and string for column names with spaces or special characters (-, ?, etc.)
report['Is day off?']
  • When using brackets and string, don’t forget the quotation mark around the column name
  • Use square brackets, not parentheses

3. Selecting multiple columns

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:

new_df = table[['column1', 'column2']]

For example

new_df = students[['last_name', 'email']]

III. Select rows

DataFrames are zero-indexed, meaning that we start with the 0th row.

1. Using iloc[]

For example, to select 3rd row of students table, we use students.iloc[2]

We can also select multiple rows

  • students.iloc[2:5] selects all rows starting at the 2nd row and up to but not including the 5th row
  • students.iloc[:4] selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)
  • students.iloc[-3:] selects the last 3 rows.

2. Selecting rows with logic

We can select rows when the statement is true.

df[df.MyColumnName == statement]

Recall that we use the following operators:

  • == tests that two values are equal.
  • != tests that two values are not equal.
  • > and < test that greater than or less than, respectively.
  • >= and <= test greater than or equal to or less than or equal to, respectively.
students[students["grade"] > 60]

3. Selecting rows by categorical variables

To select rows from multiple categories, we use | operator

For example, selects the row contains the data from March and April.

march_april = df[(df.month == 'March') | (df.month == 'April')]

4. Selecting rows using isin

We can filter multiple values of a categorical variable, the easiest way is to use the isin method.

For example

We can use the isin command to create the variable january_february_march, containing the data from January, February, and March.

january_february_march = df[df.month.isin(['January', 'February', 'March'])]