Breakthrough Tech Lab 1


Part 1 of a series of notebooks for teaching ML to early college students in an 8 week summer lab session.

Lab 1 - Python Basics

The goal of this weeks lab is to work through the basics of python with a focus on the aspects that are important for datascience and machine learning.

python

Python is pretty much the standard programming language for AI and machine learning these days. It is widely used in companies like Google or Netflix for their AI systems.

It is also what is used in research and most non-profit organizations. This is lucky because it also one of the most fun programming languages!

This week we will walkthrough the basics of Python and notebooks.

  • Unit A: Types and Documentation
  • Unit B: String and Functions

Unit A

Working with types

This summer we will be working with lots of different types of data. Sometimes that data wilil be numerical such as a temperature:

In [1]:
98.7
Out[1]:
98.7

Other times it will be a text string such as a name:

In [2]:
"New York City"
Out[2]:
'New York City'

More advanced cases will have lists of elements such as many names:

In [3]:
["Queens", "Brooklyn", "Manhattan", "Staten Island", "The Bronx"]
Out[3]:
['Queens', 'Brooklyn', 'Manhattan', 'Staten Island', 'The Bronx']

NYC

Python has many different types like this. Knowing the type of your data is the first step.

Let's look at some examples

Numbers

Numbers are the simplest type we will work with. Simply make a variable name and assign a number value.

In [4]:
my_number_variable = 80
In [5]:
my_number_variable
Out[5]:
80

Here are two more.

In [6]:
number1 = 10.5
number2 = 20

Note: if you have learned a different programming language, such as Java, you might remember having to declare the type of a variables like

int number1 = 10

You don't have to do that in Python. The type is still there, but it is added automatically.

You can add two numbers together to make a

In [7]:
number3 = number1 + number2

πŸ‘©β€πŸŽ“Student Question: What value does number3 have?

In [8]:
#πŸ“πŸ“πŸ“πŸ“
pass

Strings

Strings are very easy to use in python. You can just use quotes to create them.

In [9]:
string1 = "New York "
string2 = "City"

To combine two strings you simply add them together.

In [10]:
string3 = string1 + string2

πŸ‘©β€πŸŽ“Student Question: What value does string3 have?

In [11]:
#πŸ“πŸ“πŸ“πŸ“
pass

Lists

Python has a simple type for multiple values called a list.

In [12]:
list1 = [1, 2, 3]
list2 = [4, 5]

Note: if you have learned a different programming language, such as Java, you might remember arrays. Python lists are like arrays but way easier. You don't need to declare their size or type. Just make them.

Adding two lists together creates a new list combining the two.

In [13]:
list3 = list1 + list2

πŸ‘©β€πŸŽ“Student Question: What value does list3 have?

In [14]:
#πŸ“πŸ“πŸ“πŸ“
pass

Dictionaries

A dictionary type stores "keys" and "values", and allows you to look up values using their corresponding keys.

You can have as many keys and values as you want, and they can be of most of the types that we have seen so far.

In [15]:
dict1 = {"apple": "red",
         "banana": "yellow"}
dict1
Out[15]:
{'apple': 'red', 'banana': 'yellow'}

To access a value of the dictionary, you use the square bracket notation with the key that you want to access.

In [16]:
dict1["apple"]
Out[16]:
'red'
In [17]:
dict1["banana"]
Out[17]:
'yellow'

You can also add a new key to the dictionary by setting its value.

In [18]:
dict1["pear"] = "green"

πŸ‘©β€πŸŽ“Student Question: What value does dict1 have?

In [19]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Importing and Reading Docs

Numbers, strings, and lists. These are the most common types of data that we will use throughout the summer. Every programmer should know these basic types.

However to be a really good, it is important to be able to use types that you don't yet know. Most of the time the problem that you are interested will have a type that is already made by someone else.

For instance, let's say we want a type for a date. We could try to write our own.

In [20]:
day = 8
month = "June"
year = 2021

But there are so many things we would need to add! How do we represent weeks? Leap years? How do we count number of days?

So instead let us use a package. To use a package first we import it.

In [21]:
import datetime

Then we use . notation to use the package. This gives us a date variable for the current day.

In [22]:
date1 = datetime.datetime.now()
date1
Out[22]:
datetime.datetime(2021, 6, 10, 13, 48, 54, 759587)

How did I know how to do this?

Honestly, I had no idea. I completly forgot how this worked so I did this.

  1. Google'd "how do i get the current time in python"
  2. Clicked the link we get back here https://stackoverflow.com/questions/415511/how-to-get-the-current-time-in-python

This is one of the most important skills to learn Python :)

The format of the output of the line above is telling use the we can access the day and month of the current date in the following manner.

In [23]:
date1.day
Out[23]:
10
In [24]:
date1.month
Out[24]:
6

πŸ‘©β€πŸŽ“Student Question: Can you print the current value of the year

In [25]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Group Exercise A

Question 1

We saw that when we had a date type that it gave use the month as a number.

In [26]:
date1.month
Out[26]:
6

If we want to turn the months into more standard names we can do so by making a dictionary.

πŸ“πŸ“πŸ“πŸ“ FILLME months = { 1 : "Jan", ... }

User your dictionary to convert the current month to a month name.

In [27]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Question 2

One common data operations is to count unique items. For instance if we are voting we may have a list of votes from each person voting for candidate A, B, or C

In [28]:
votes = ["A", "B", "A", "A", "C", "C", "B", "A", "A", "A"]

There is a special type in Python that makes this operation really easy known as a Counter. It lives in a package known as collections. We can use it by importing it like this

In [29]:
from collections import Counter

For this exercise, you should google for how to use the Counter. Use what you find to print out the count of each number of votes that each candidate "A" received.

In [30]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Question 3

Another useful aspect of the Counter is that it can tell you the most common elements in a list. This is particularly useful when there are a ton of different elements to work with. Google for how to find the most common element in a list.

For this question, you will tell us the 10th most common letter in the beginning of the "Wizard of Oz".

In [31]:
wizard_of_oz = list("Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer’s wife. Their house was small, for the lumber to build it had to be carried by wagon many miles. There were four walls, a floor and a roof, which made one room; and this room contained a rusty looking cookstove, a cupboard for the dishes, a table, three or four chairs, and the beds. Uncle Henry and Aunt Em had a big bed in one corner, and Dorothy a little bed in another corner. There was no garret at all, and no cellarβ€”except a small hole dug in the ground, called a cyclone cellar, where the family could go in case one of those great whirlwinds arose, mighty enough to crush any building in its path. It was reached by a trap door in the middle of the floor, from which a ladder led down into the small, dark hole.")
wizard_of_oz
Out[31]:
['D',
 'o',
 'r',
 'o',
 't',
 'h',
 'y',
 ' ',
 'l',
 'i',
 'v',
 'e',
 'd',
 ' ',
 'i',
 'n',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'm',
 'i',
 'd',
 's',
 't',
 ' ',
 'o',
 'f',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'g',
 'r',
 'e',
 'a',
 't',
 ' ',
 'K',
 'a',
 'n',
 's',
 'a',
 's',
 ' ',
 'p',
 'r',
 'a',
 'i',
 'r',
 'i',
 'e',
 's',
 ',',
 ' ',
 'w',
 'i',
 't',
 'h',
 ' ',
 'U',
 'n',
 'c',
 'l',
 'e',
 ' ',
 'H',
 'e',
 'n',
 'r',
 'y',
 ',',
 ' ',
 'w',
 'h',
 'o',
 ' ',
 'w',
 'a',
 's',
 ' ',
 'a',
 ' ',
 'f',
 'a',
 'r',
 'm',
 'e',
 'r',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 'A',
 'u',
 'n',
 't',
 ' ',
 'E',
 'm',
 ',',
 ' ',
 'w',
 'h',
 'o',
 ' ',
 'w',
 'a',
 's',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'f',
 'a',
 'r',
 'm',
 'e',
 'r',
 '’',
 's',
 ' ',
 'w',
 'i',
 'f',
 'e',
 '.',
 ' ',
 'T',
 'h',
 'e',
 'i',
 'r',
 ' ',
 'h',
 'o',
 'u',
 's',
 'e',
 ' ',
 'w',
 'a',
 's',
 ' ',
 's',
 'm',
 'a',
 'l',
 'l',
 ',',
 ' ',
 'f',
 'o',
 'r',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'l',
 'u',
 'm',
 'b',
 'e',
 'r',
 ' ',
 't',
 'o',
 ' ',
 'b',
 'u',
 'i',
 'l',
 'd',
 ' ',
 'i',
 't',
 ' ',
 'h',
 'a',
 'd',
 ' ',
 't',
 'o',
 ' ',
 'b',
 'e',
 ' ',
 'c',
 'a',
 'r',
 'r',
 'i',
 'e',
 'd',
 ' ',
 'b',
 'y',
 ' ',
 'w',
 'a',
 'g',
 'o',
 'n',
 ' ',
 'm',
 'a',
 'n',
 'y',
 ' ',
 'm',
 'i',
 'l',
 'e',
 's',
 '.',
 ' ',
 'T',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'w',
 'e',
 'r',
 'e',
 ' ',
 'f',
 'o',
 'u',
 'r',
 ' ',
 'w',
 'a',
 'l',
 'l',
 's',
 ',',
 ' ',
 'a',
 ' ',
 'f',
 'l',
 'o',
 'o',
 'r',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 'a',
 ' ',
 'r',
 'o',
 'o',
 'f',
 ',',
 ' ',
 'w',
 'h',
 'i',
 'c',
 'h',
 ' ',
 'm',
 'a',
 'd',
 'e',
 ' ',
 'o',
 'n',
 'e',
 ' ',
 'r',
 'o',
 'o',
 'm',
 ';',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 't',
 'h',
 'i',
 's',
 ' ',
 'r',
 'o',
 'o',
 'm',
 ' ',
 'c',
 'o',
 'n',
 't',
 'a',
 'i',
 'n',
 'e',
 'd',
 ' ',
 'a',
 ' ',
 'r',
 'u',
 's',
 't',
 'y',
 ' ',
 'l',
 'o',
 'o',
 'k',
 'i',
 'n',
 'g',
 ' ',
 'c',
 'o',
 'o',
 'k',
 's',
 't',
 'o',
 'v',
 'e',
 ',',
 ' ',
 'a',
 ' ',
 'c',
 'u',
 'p',
 'b',
 'o',
 'a',
 'r',
 'd',
 ' ',
 'f',
 'o',
 'r',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'd',
 'i',
 's',
 'h',
 'e',
 's',
 ',',
 ' ',
 'a',
 ' ',
 't',
 'a',
 'b',
 'l',
 'e',
 ',',
 ' ',
 't',
 'h',
 'r',
 'e',
 'e',
 ' ',
 'o',
 'r',
 ' ',
 'f',
 'o',
 'u',
 'r',
 ' ',
 'c',
 'h',
 'a',
 'i',
 'r',
 's',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'b',
 'e',
 'd',
 's',
 '.',
 ' ',
 'U',
 'n',
 'c',
 'l',
 'e',
 ' ',
 'H',
 'e',
 'n',
 'r',
 'y',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 'A',
 'u',
 'n',
 't',
 ' ',
 'E',
 'm',
 ' ',
 'h',
 'a',
 'd',
 ' ',
 'a',
 ' ',
 'b',
 'i',
 'g',
 ' ',
 'b',
 'e',
 'd',
 ' ',
 'i',
 'n',
 ' ',
 'o',
 'n',
 'e',
 ' ',
 'c',
 'o',
 'r',
 'n',
 'e',
 'r',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 'D',
 'o',
 'r',
 'o',
 't',
 'h',
 'y',
 ' ',
 'a',
 ' ',
 'l',
 'i',
 't',
 't',
 'l',
 'e',
 ' ',
 'b',
 'e',
 'd',
 ' ',
 'i',
 'n',
 ' ',
 'a',
 'n',
 'o',
 't',
 'h',
 'e',
 'r',
 ' ',
 'c',
 'o',
 'r',
 'n',
 'e',
 'r',
 '.',
 ' ',
 'T',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'w',
 'a',
 's',
 ' ',
 'n',
 'o',
 ' ',
 'g',
 'a',
 'r',
 'r',
 'e',
 't',
 ' ',
 'a',
 't',
 ' ',
 'a',
 'l',
 'l',
 ',',
 ' ',
 'a',
 'n',
 'd',
 ' ',
 'n',
 'o',
 ' ',
 'c',
 'e',
 'l',
 'l',
 'a',
 'r',
 'β€”',
 'e',
 'x',
 'c',
 'e',
 'p',
 't',
 ' ',
 'a',
 ' ',
 's',
 'm',
 'a',
 'l',
 'l',
 ' ',
 'h',
 'o',
 'l',
 'e',
 ' ',
 'd',
 'u',
 'g',
 ' ',
 'i',
 'n',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'g',
 'r',
 'o',
 'u',
 'n',
 'd',
 ',',
 ' ',
 'c',
 'a',
 'l',
 'l',
 'e',
 'd',
 ' ',
 'a',
 ' ',
 'c',
 'y',
 'c',
 'l',
 'o',
 'n',
 'e',
 ' ',
 'c',
 'e',
 'l',
 'l',
 'a',
 'r',
 ',',
 ' ',
 'w',
 'h',
 'e',
 'r',
 'e',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'f',
 'a',
 'm',
 'i',
 'l',
 'y',
 ' ',
 'c',
 'o',
 'u',
 'l',
 'd',
 ' ',
 'g',
 'o',
 ' ',
 'i',
 'n',
 ' ',
 'c',
 'a',
 's',
 'e',
 ' ',
 'o',
 'n',
 'e',
 ' ',
 'o',
 'f',
 ' ',
 't',
 'h',
 'o',
 's',
 'e',
 ' ',
 'g',
 'r',
 'e',
 'a',
 't',
 ' ',
 'w',
 'h',
 'i',
 'r',
 'l',
 'w',
 'i',
 'n',
 'd',
 's',
 ' ',
 'a',
 'r',
 'o',
 's',
 'e',
 ',',
 ' ',
 'm',
 'i',
 'g',
 'h',
 't',
 'y',
 ' ',
 'e',
 'n',
 'o',
 'u',
 'g',
 'h',
 ' ',
 't',
 'o',
 ' ',
 'c',
 'r',
 'u',
 's',
 'h',
 ' ',
 'a',
 'n',
 'y',
 ' ',
 'b',
 'u',
 'i',
 'l',
 'd',
 'i',
 'n',
 'g',
 ' ',
 'i',
 'n',
 ' ',
 'i',
 't',
 's',
 ' ',
 'p',
 'a',
 't',
 'h',
 '.',
 ' ',
 'I',
 't',
 ' ',
 'w',
 'a',
 's',
 ' ',
 'r',
 'e',
 'a',
 'c',
 'h',
 'e',
 'd',
 ' ',
 'b',
 'y',
 ' ',
 'a',
 ' ',
 't',
 'r',
 'a',
 'p',
 ' ',
 'd',
 'o',
 'o',
 'r',
 ' ',
 'i',
 'n',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'm',
 'i',
 'd',
 'd',
 'l',
 'e',
 ' ',
 'o',
 'f',
 ' ',
 't',
 'h',
 'e',
 ' ',
 'f',
 'l',
 'o',
 'o',
 'r',
 ',',
 ' ',
 'f',
 'r',
 'o',
 'm',
 ' ',
 'w',
 'h',
 'i',
 'c',
 'h',
 ' ',
 'a',
 ' ',
 'l',
 'a',
 'd',
 'd',
 'e',
 'r',
 ' ',
 'l',
 'e',
 'd',
 ' ',
 'd',
 'o',
 'w',
 'n',
 ' ',
 'i',
 'n',
 't',
 'o',
 ' ',
 't',
 'h',
 'e',
 ' ',
 's',
 'm',
 'a',
 'l',
 'l',
 ',',
 ' ',
 'd',
 'a',
 'r',
 'k',
 ' ',
 'h',
 'o',
 'l',
 'e',
 '.']

Print out the 10'th most common element in this list.

In [32]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

More than anything remember this. The best programmers use help the most! No one wins a prize for memorizing the most functions. If you want to be a good programmer, learn how to look things up quickly and ask the most questions.

Unit B

In addition to the data types and libraries, we will sometimes use Python to write our own code. In general when doing data science you should not have to write very long amounts of code, but there are some cases when it is useful.

Basic Structures

if statements

If statements check for a condition and run the code if it is true. In Python you need to indent the code under the if statement otherwise it will not run.

In [33]:
number3 = 10 + 75.0
In [34]:
if number3 > 50:
    print("number is greater than 50")
number is greater than 50
In [35]:
if number3 > 100:
    print("number is greater than 100")

You can also have a backup else code block that will run if the condition is not true.

In [36]:
if number3 > 100:
    print("number is greater than 100")
else:
    print("number is not greater than 100")
number is not greater than 100

for loops

For loops in python are used to step through the items in a list one by

In [37]:
list3
Out[37]:
[1, 2, 3, 4, 5]

You indicate a for loop in the following manner. The code will be run 5 times with the variable value taking on a new value each time through the loop.

In [38]:
for value in list3:
    print("Next value is: ", value)
Next value is:  1
Next value is:  2
Next value is:  3
Next value is:  4
Next value is:  5

Note: unlike other languages Python for loops always need a list to walkthough. This differs from language where you have a counter variable.

However, Python also includes a nice shortcut for making it easy to write for loops like this. The command range will make a list starting from a value and stop right before the end value.

In [39]:
for val in range(10):
    print(val)
0
1
2
3
4
5
6
7
8
9

πŸ‘©β€πŸŽ“Student Question: Print out each month name from your month dictionary.

In [40]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
for month in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]:
    pass

Working with Strings

Strings are an importnat special case. Throughout this class we will work a lot with text. We will start with simple examples and work our way up to artificial intelligence over text.

Text will also be represented with a string type. This is created with quotes.

In [41]:
str1 = "A sample string to get started"

Just like with lists, we can make a for loop over strings to get individual letters.

In [42]:
for letter in str1:
    print(letter)
A
 
s
a
m
p
l
e
 
s
t
r
i
n
g
 
t
o
 
g
e
t
 
s
t
a
r
t
e
d
In [43]:
vowels = ["a", "e", "i", "o", "u"]
for letter in str1:
    if letter in vowels:
        print(letter)
a
e
i
o
e
a
e

However, most of the time it will be better to use one of the built-in functions in Python. Most of the time it is best to google for these, but here are some important ones to remember

Split

Splits a string up into a list of strings based on a separator

In [44]:
str1 = "a:b:c"
list_of_splits = str1.split(":")
list_of_splits[1]
Out[44]:
'b'

Join

Joins a string back together from a list.

In [45]:
str1 = ",".join(list_of_splits)

πŸ‘©β€πŸŽ“Student Question: What value does str1 have?

In [46]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Replace

Replaces some part of a string.

In [47]:
original_str = "Item 1 | Item 2 | Item 3"
new_str = original_str.replace("|", ",")
new_str
Out[47]:
'Item 1 , Item 2 , Item 3'
In [48]:
new_str = original_str.replace("|", "")
new_str
Out[48]:
'Item 1  Item 2  Item 3'

In

Checks if one string contains another

In [49]:
original_str = "Item 1 | Item 2 | Item 3"
contains1 = "Item 2" in original_str
In [50]:
contains2 = "Item 4" in original_str

πŸ‘©β€πŸŽ“Student Question: What values do contains1 and contains2 have?

In [51]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Conversions

Converts between a string and a number

In [52]:
int1 = int("15")
int1
Out[52]:
15
In [53]:
decimal1 = float("15.50")
decimal1
Out[53]:
15.5

Functions

Functions are small snippets of code that you may want to use multiple times.

In [54]:
def add_man(str1):
    return str1 + "man"
In [55]:
out = add_man("bat")
out
Out[55]:
'batman'

Most of the time, functions should not change the variables that are sent to them. For instance here we do not change the variable y.

In [56]:
y = "bat"
out = add_man(y)
out
Out[56]:
'batman'
In [57]:
y
Out[57]:
'bat'

One interesting aspect of Python is that it lets your pass functions to functions. For instance, the built-in function map is a function applies another function to each element of a list.

Assume we have a list like this.

In [58]:
word_list = ["spider", "bat", "super"]

If we want a list with man added to each we cannot run the following:

Doesn't work: add_man(word_list)

However, the map function makes this work, by creating a new list.

In [59]:
out = map(add_man, word_list)
out
Out[59]:
<map at 0x7fc83427bdf0>

Group Exercise B

Question 1

When processing real-world data it is very common to be given a complex string. that contains many different items all smashed together.

In [60]:
real_word_string1 = "Sasha Rush,arush@cornell.edu,Roosevelt Island,NYC"

Use one of the string functions above to pull out the email from this string and print it.

In [61]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Question 2

Now assume the we have a list of strings.

In [62]:
real_word_strings2 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
                     "Bill Jones,bjones@cornell.edu,Manhattan,NY",
                     "Sarah Jones,sjones@cornell.edu,Queens,NY"]

Write a for loop that does the following.

  • Steps through each string
  • Finds the email address
  • Prints out the email address
In [63]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Question 3

Next we will assume that we have a list of strings where people come from different locations. Your goal is to step through the list and print out the emails of only the people who come from New York.

In [64]:
real_word_strings3 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
                      "Erica Zhou,ezhou@cornell.edu,Manhattan,NY",
                      "Jessica Peters,jpeters@cornell.edu,Miami,FL",
                      "Bill Jones,bjones@cornell.edu,Philadelpha,PA",
                      "Sarah Jones,sjones@cornell.edu,Queens,NY"]
In [65]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass

Question 4

Finally lets assume that we want to create a new list of strings. We are going to do this by adding one more element to each string.

Instead of "Sasha Rush,arush@cornell.edu,Roosevelt Island,NY" we want it to say => "Sasha Rush,arush@cornell.edu,Roosevelt Island,NY,Computer Science"

Your task is to add this last element to each one of the strings

In [66]:
real_word_strings4 = ["Sasha Rush,arush@cornell.edu,Roosevelt Island,NY",
                      "Erica Zhou,ezhou@cornell.edu,Manhattan,NY",
                      "Jessica Peters,jpeters@cornell.edu,Miami,FL",
                      "Bill Jones,bjones@cornell.edu,Philadelpha,PA",
                      "Sarah Jones,sjones@cornell.edu,Queens,NY"]
In [67]:
#πŸ“πŸ“πŸ“πŸ“ FILLME
pass