Chapter 7: Hello filters¶
Until November 2016, the use and sale of marijuana for recreational purposes was illegal in California. That changed when voters approved Proposition 64, which asked if the practice ought to be legalized.
A yes vote supported legalization. A no vote opposed it. In the final tally, 57% of voters said yes.
Our next mission is to use the DataFrames containing campaign committees and contributors to figure out the biggest donors both for and against the measure.
To do that, the first thing we need to do is isolate the fundraising committees active on Proposition 64, which are now buried among of the list of more than 100 groups active last November.
Filtering a DataFrame¶
The most common way to filter a DataFrame is to pass an expression as an “index” that can be used to decide which records should be kept and which discarded.
You write the expression by combining a column on your DataFrame with an “operator” like
< and a value to compare the column against.
If you are familiar with writing SQL to manipulate databases, pandas’ filtering system is somewhat similar to a
WHERE query. The official pandas documentation offers direct translations between the two.
In our case, the column we want to filter against is
prop_name. We only want to keep those records where the value there matches the full name of Proposition 64.
Where do we get that? Our friend value counts.
Running the command we learned before to list and count all of the proposition names will spit out the full name of all 17 measures.
From that result we can copy the full name of the proposition and place it between quotation marks in a variable in a new cell. This will allow us to reuse it later.
my_prop = 'PROPOSITION 064- MARIJUANA LEGALIZATION. INITIATIVE STATUTE.'
In the next cell we will ask pandas to narrow down our list of committees to just those that match the proposition we’re interested in. We will create a filter expression that looks like this:
committee_list.prop_name == my_prop, and place it between two flat brackets following the variable we want to filter. Place the following code in the next open cell in your notebook.
committee_list[committee_list.prop_name == my_prop]
Run it and it outputs the filtered dataset, just those committees active on Proposition 64.
Now we should save the results of that filter into new variable separate from the full list we imported from the CSV file.
Since it includes only the committees for one proposition lets call it the singular prop.
my_committees = committee_list[committee_list.prop_name == my_prop]
To check our work find out how many committees are left after the filter, let’s run the DataFrame inspection commands we learned earlier.