Sorting and Selecting Data

I. Sorting

1. Sorting by one variable

You can sort the rows by passing a column name to .sort_values().

df.sort_values(column)

For example

students.sort_values("grade")

To sort in descending order, we add ascending=False

In this example, the grade will be the highest to the lowest:

students.sort_values("grade", ascending=False)

2. Sorting by multiple variables

We can sort by multiple variables by passing a list of column names to sort_values()

students.sort_values("grade", "age")

To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.

students.sort_values("grade", "age", ascending=[True, False])

II. Select columns

Columns can be used in calculation and plotting data.

1. Selecting with brackets and strings

We select with DataFrame-name['column']

name = records['name']
print(name)

2. Selecting with a dot

If a column string only contains letters, numbers and underscores, we can use dot notation.

DataFrame-name.column-name

For example, with the DataFrame called students, we can select the name column with students.name.

Note in column selection

  • Use brackets and string for column names with spaces or special characters (-, ?, etc.)
report['Is day off?']
  • When using brackets and string, don’t forget the quotation mark around the column name
report['name']
  • Use square brackets, not parentheses

3. Selecting multiple columns

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:

new_df = table[['column1', 'column2']]

For example

new_df = students[['last_name', 'email']]

III. Select rows

DataFrames are zero-indexed, meaning that we start with the 0th row.

1. Using iloc[]

For example, to select 3rd row of students table, we use students.iloc[2]

We can also select multiple rows

  • students.iloc[2:5] selects all rows starting at the 2nd row and up to but not including the 5th row
  • students.iloc[:4] selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)
  • students.iloc[-3:] selects the last 3 rows.

2. Selecting rows with logic

We can select rows when the statement is true.

df[df.MyColumnName == statement]

Recall that we use the following operators:

  • == tests that two values are equal.
  • != tests that two values are not equal.
  • > and < test that greater than or less than, respectively.
  • >= and <= test greater than or equal to or less than or equal to, respectively.
students[students["grade"] > 60]

3. Selecting rows by categorical variables

To select rows from multiple categories, we use | operator

For example, selects the row contains the data from March and April.

march_april = df[(df.month == 'March') | (df.month == 'April')]

4. Selecting rows using isin

We can filter multiple values of a categorical variable, the easiest way is to use the isin method.

For example

We can use the isin command to create the variable january_february_march, containing the data from January, February, and March.

january_february_march = df[df.month.isin(['January', 'February', 'March'])]

print(january_february_march)