Breakthrough Tech Lab 1
Part 1 of a series of notebooks for teaching ML to early college students in an 8 week summer lab session.
Lab 1 - Python Basics¶
The goal of this weeks lab is to work through the basics of python with a focus on the aspects that are important for datascience and machine learning.
Python is pretty much the standard programming language for AI and machine learning these days. It is widely used in companies like Google or Netflix for their AI systems.
It is also what is used in research and most non-profit organizations. This is lucky because it also one of the most fun programming languages!
This week we will walkthrough the basics of Python and notebooks.
- Unit A: Types and Documentation
- Unit B: String and Functions
Unit A¶
Working with types¶
This summer we will be working with lots of different types of data. Sometimes that data wilil be numerical such as a temperature:
98.7
Other times it will be a text string such as a name:
"New York City"
More advanced cases will have lists of elements such as many names:
["Queens", "Brooklyn", "Manhattan", "Staten Island", "The Bronx"]
Python has many different types like this. Knowing the type of your data is the first step.
Let's look at some examples
Numbers¶
Numbers are the simplest type we will work with. Simply make a variable name and assign a number value.
my_number_variable = 80
my_number_variable
Here are two more.
number1 = 10.5
number2 = 20
Note: if you have learned a different programming language, such as Java, you might remember having to declare the type of a variables like
int number1 = 10
You don't have to do that in Python. The type is still there, but it is added automatically.
You can add two numbers together to make a
number3 = number1 + number2
π©βπStudent Question: What value does number3
have?
#ππππ
pass
Strings¶
Strings are very easy to use in python. You can just use quotes to create them.
string1 = "New York "
string2 = "City"
To combine two strings you simply add them together.
string3 = string1 + string2
π©βπStudent Question: What value does string3
have?
#ππππ
pass
Lists¶
Python has a simple type for multiple values called a list.
list1 = [1, 2, 3]
list2 = [4, 5]
Note: if you have learned a different programming language, such as Java, you might remember arrays. Python lists are like arrays but way easier. You don't need to declare their size or type. Just make them.
Adding two lists together creates a new list combining the two.
list3 = list1 + list2
π©βπStudent Question: What value does list3
have?
#ππππ
pass
Dictionaries¶
A dictionary type stores "keys" and "values", and allows you to look up values using their corresponding keys.
You can have as many keys and values as you want, and they can be of most of the types that we have seen so far.
dict1 = {"apple": "red",
"banana": "yellow"}
dict1
To access a value of the dictionary, you use the square bracket notation with the key that you want to access.
dict1["apple"]
dict1["banana"]
You can also add a new key to the dictionary by setting its value.
dict1["pear"] = "green"
π©βπStudent Question: What value does dict1
have?
#ππππ FILLME
pass
Importing and Reading Docs¶
Numbers, strings, and lists. These are the most common types of data that we will use throughout the summer. Every programmer should know these basic types.
However to be a really good, it is important to be able to use types that you don't yet know. Most of the time the problem that you are interested will have a type that is already made by someone else.
For instance, let's say we want a type for a date. We could try to write our own.
day = 8
month = "June"
year = 2021
But there are so many things we would need to add! How do we represent weeks? Leap years? How do we count number of days?
So instead let us use a package. To use a package first we import
it.
import datetime
Then we use .
notation to use the package. This gives us a date variable for the current day.
date1 = datetime.datetime.now()
date1
How did I know how to do this?
Honestly, I had no idea. I completly forgot how this worked so I did this.
- Google'd "how do i get the current time in python"
- Clicked the link we get back here https://stackoverflow.com/questions/415511/how-to-get-the-current-time-in-python
This is one of the most important skills to learn Python :)
The format of the output of the line above is telling use the we can access the day and month of the current date in the following manner.
date1.day
date1.month
π©βπStudent Question: Can you print the current value of the year
#ππππ FILLME
pass
Group Exercise A¶
Question 1¶
We saw that when we had a date type that it gave use the month as a number.
date1.month
If we want to turn the months into more standard names we can do so by making a dictionary.
ππππ FILLME months = { 1 : "Jan", ... }
User your dictionary to convert the current month to a month name.
#ππππ FILLME
pass
Question 2¶
One common data operations is to count
unique items. For instance
if we are voting we may have a list of votes from each person voting for
candidate A, B, or C
votes = ["A", "B", "A", "A", "C", "C", "B", "A", "A", "A"]
There is a special type in Python that makes this operation really easy known
as a Counter. It lives in a package known as collections
. We can use it
by importing it like this
from collections import Counter
For this exercise, you should google for how to use the Counter. Use what you find to print out the count of each number of votes that each candidate "A" received.
#ππππ FILLME
pass
Question 3¶
Another useful aspect of the Counter
is that it can tell you the most common
elements in a list. This is particularly useful when there are a ton of different
elements to work with. Google for how to find the most common element in a list.
For this question, you will tell us the 10th most common letter in the beginning of the "Wizard of Oz".
wizard_of_oz = list("Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmerβs wife. Their house was small, for the lumber to build it had to be carried by wagon many miles. There were four walls, a floor and a roof, which made one room; and this room contained a rusty looking cookstove, a cupboard for the dishes, a table, three or four chairs, and the beds. Uncle Henry and Aunt Em had a big bed in one corner, and Dorothy a little bed in another corner. There was no garret at all, and no cellarβexcept a small hole dug in the ground, called a cyclone cellar, where the family could go in case one of those great whirlwinds arose, mighty enough to crush any building in its path. It was reached by a trap door in the middle of the floor, from which a ladder led down into the small, dark hole.")
wizard_of_oz
Print out the 10'th most common element in this list.
#ππππ FILLME
pass
More than anything remember this. The best programmers use help the most! No one wins a prize for memorizing the most functions. If you want to be a good programmer, learn how to look things up quickly and ask the most questions.
Unit B¶
In addition to the data types and libraries, we will sometimes use Python to write our own code. In general when doing data science you should not have to write very long amounts of code, but there are some cases when it is useful.
Basic Structures¶
if
statements¶
If statements check for a condition and run the code if it is true. In Python you need to indent the code under the if statement otherwise it will not run.
number3 = 10 + 75.0
if number3 > 50:
print("number is greater than 50")
if number3 > 100:
print("number is greater than 100")
You can also have a backup else
code block that will run if
the condition is not true.
if number3 > 100:
print("number is greater than 100")
else:
print("number is not greater than 100")
for
loops¶
For loops in python are used to step through the items in a list one by
list3
You indicate a for loop in the following manner. The code will be run 5 times
with the variable value
taking on a new value each time through the loop.
for value in list3:
print("Next value is: ", value)
Note: unlike other languages Python for loops always need a list to walkthough. This differs from language where you have a counter variable.
However, Python also includes a nice shortcut for making it easy to write for
loops like this. The command range
will make a list starting from
a value and stop right before the end value.
for val in range(10):
print(val)
π©βπStudent Question: Print out each month name from your month dictionary.
#ππππ FILLME
for month in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]:
pass
Working with Strings¶
Strings are an importnat special case. Throughout this class we will work a lot with text. We will start with simple examples and work our way up to artificial intelligence over text.
Text will also be represented with a string type. This is created with quotes.
str1 = "A sample string to get started"
Just like with lists, we can make a for loop over strings to get individual letters.
for letter in str1:
print(letter)
vowels = ["a", "e", "i", "o", "u"]
for letter in str1:
if letter in vowels:
print(letter)
However, most of the time it will be better to use one of the built-in functions in Python. Most of the time it is best to google for these, but here are some important ones to remember
Split¶
Splits a string up into a list of strings based on a separator
str1 = "a:b:c"
list_of_splits = str1.split(":")
list_of_splits[1]
Join¶
Joins a string back together from a list.
str1 = ",".join(list_of_splits)
π©βπStudent Question: What value does str1
have?
#ππππ FILLME
pass
Replace¶
Replaces some part of a string.
original_str = "Item 1 | Item 2 | Item 3"
new_str = original_str.replace("|", ",")
new_str
new_str = original_str.replace("|", "")
new_str
In¶
Checks if one string contains another
original_str = "Item 1 | Item 2 | Item 3"
contains1 = "Item 2" in original_str
contains2 = "Item 4" in original_str
π©βπStudent Question: What values do contains1
and contains2
have?
#ππππ FILLME
pass
Conversions¶
Converts between a string and a number
int1 = int("15")
int1
decimal1 = float("15.50")
decimal1
Functions¶
Functions are small snippets of code that you may want to use multiple times.
def add_man(str1):
return str1 + "man"
out = add_man("bat")
out
Most of the time, functions should not change the variables that
are sent to them. For instance here we do not change the variable y
.
y = "bat"
out = add_man(y)
out
y
One interesting aspect of Python is that it lets your pass functions
to functions. For instance, the built-in function map
is a function
applies another function to each element of a list.
Assume we have a list like this.
word_list = ["spider", "bat", "super"]
If we want a list with man
added to each we cannot run the following:
Doesn't work: add_man(word_list)
However, the map function makes this work, by creating a new list.
out = map(add_man, word_list)
out
Group Exercise B¶
Question 1¶
When processing real-world data it is very common to be given a complex string. that contains many different items all smashed together.
real_word_string1 = "Sasha Rush,arush@cornell.edu,Roosevelt Island,NYC"
Use one of the string functions above to pull out the email from this string and print it.
#ππππ FILLME
pass
Question 2¶
Now assume the we have a list of strings.
real_word_strings2 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
"Bill Jones,bjones@cornell.edu,Manhattan,NY",
"Sarah Jones,sjones@cornell.edu,Queens,NY"]
Write a for loop that does the following.
- Steps through each string
- Finds the email address
- Prints out the email address
#ππππ FILLME
pass
Question 3¶
Next we will assume that we have a list of strings where people come from different locations. Your goal is to step through the list and print out the emails of only the people who come from New York.
real_word_strings3 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
"Erica Zhou,ezhou@cornell.edu,Manhattan,NY",
"Jessica Peters,jpeters@cornell.edu,Miami,FL",
"Bill Jones,bjones@cornell.edu,Philadelpha,PA",
"Sarah Jones,sjones@cornell.edu,Queens,NY"]
#ππππ FILLME
pass
Question 4¶
Finally lets assume that we want to create a new list of strings. We are going to do this by adding one more element to each string.
Instead of "Sasha Rush,arush@cornell.edu,Roosevelt Island,NY" we want it to say => "Sasha Rush,arush@cornell.edu,Roosevelt Island,NY,Computer Science"
Your task is to add this last element to each one of the strings
real_word_strings4 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
"Erica Zhou,ezhou@cornell.edu,Manhattan,NY",
"Jessica Peters,jpeters@cornell.edu,Miami,FL",
"Bill Jones,bjones@cornell.edu,Philadelpha,PA",
"Sarah Jones,sjones@cornell.edu,Queens,NY"]
#ππππ FILLME
pass