You can sort the rows by passing a column name to
To sort in descending order, we add
In this example, the grade will be the highest to the lowest:
We can sort by multiple variables by passing a list of column names to
To change the direction values are sorted in, pass a list to the ascending argument to specific which direction sorting.
students.sort_values("grade", "age", ascending=[True, False])
Columns can be used in calculation and plotting data.
We select with
name = records['name'] print(name)
If a column string only contains letters, numbers and underscores, we can use dot notation.
For example, with the DataFrame called students, we can select the name column with
Note in column selection
report['Is day off?']
To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:
new_df = table[['column1', 'column2']]
new_df = students[['last_name', 'email']]
DataFrames are zero-indexed, meaning that we start with the 0th row.
For example, to select 3rd row of students table, we use
We can also select multiple rows
students.iloc[2:5]selects all rows starting at the 2nd row and up to but not including the 5th row
students.iloc[:4]selects the first 4 rows (i.e., the 0th, 1st, 2nd, and 3rd rows)
students.iloc[-3:]selects the last 3 rows.
We can select rows when the statement is true.
df[df.MyColumnName == statement]
Recall that we use the following operators:
==tests that two values are equal.
!=tests that two values are not equal.
<test that greater than or less than, respectively.
<=test greater than or equal to or less than or equal to, respectively.
students[students["grade"] > 60]
To select rows from multiple categories, we use | operator
For example, selects the row contains the data from March and April.
march_april = df[(df.month == 'March') | (df.month == 'April')]
We can filter multiple values of a categorical variable, the easiest way is to use the isin method.
We can use the
isin command to create the variable january_february_march, containing the data from January, February, and March.
january_february_march = df[df.month.isin(['January', 'February', 'March'])] print(january_february_march)