Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2016/17 - Class Log.

Wed 28 Sep 2016: Welcome! We (i.e., Paul) talked about what's in store for the next 12 weeks - lots and lots of Python. The sites visited during today's class included:

For tomorrow's class, please download and install Python 3 on your laptop (and bring your laptop with you from now on).

Thurs 29 Sep 2016: Introduced IDLE and the >>> prompt. Looked (briefly) at PyCharm and WingIDE. Learned about the following built-in functions: print, dir, type, help, and len. Also explored the standard library (and imported random), as well as took a short look at The Python Package Index (PyPI). We're off and running...

Fri 30 Sep 2016: Worked on our first complete Python program (called, rather imaginatively, which demonstrated Python's use of sequence, iteration (loops), and selection (if). We also used a number of useful Python built-in functions (BIFs) including range, len, dir, print, and list. We imported the random and datetime libraries, then used them to access some of Python's built-in standard library functionality. All of this activity let us build our first Python program ( As a follow-up, we specifed the "beer song" program and started working on that. Please send me your "beer song" solutions prior to our next scheduled class.

Wed 5 Oct 2016: Looked at various solutions to the beersong problem. Lots of discussion as we reviewed our classmates' code on screen. We reviewed Paul's code, too.

Thurs 6 Oct 2016: Started our tour of the builtin data structures by doing a deep-dive into lists. See today's transcript for all the list-method details. [The beersong refuses to go away (with yet another solution being offered). But, let's not mention it again, as there's lots of other examples to play with].

Fri 7 Oct 2016: Looked at dictionaries, sets, and tuples. Also looked (briefly) are reading data from a file and doing some simple manipulations of the read-in data. More of this type of thing (careful now) next week.

Wed 12 Oct 2016: Worked with list, set, and dictionary comprehensions (and learned about maps and filters). Worked with Counters and defaultdict from the collections module (included in the standard library). Gearing up for the first piece of coursework which is coming up on Friday of this week.

Thurs 13 Oct 2016: Worked through the Bahamas Buzzers example, taking raw data from a CSV file, before creating a custom dictionary to store the CSV file's data in Python. Created a small custom function (convert2ampm) to convert the 24hr-format time value into an AM or PM version of the time. Performed the dictionary manipulations as specified in the image files included with today's ZIP archive. Installed and then used the openpyxl module to read flight data directly from an Excel spreadsheet file. We are now ready for The Beaz...

Fri 14 Oct 2016: Watched (and discussed) David Beazley's builtin superpowers talk, then distributed the description of the first piece of coursework (which is based on the data from David's talk).

Wed 19 Oct 2016: Learned about the Jupyter Notebook. Presented the problem we wish to solve: automatically processing "unstructured" HTML with Python (i.e., from HTML to a dictionary data structure).

Thurs 20 Oct 2016: Used the "requests" library to grab some HTML data from the O'Reilly media site, then started to process the response in an effort to extract a list of data science books published by year. We only got so far... the code is brittle and hard to read/use. But, we made some progress. See today's Notebook for more.

Fri 21 Oct 2016: We used BeautifulSoup - together with requests - to grab the target webpage, then used the "soup" to extract the data we need (all in 15 lines of code - see today's notebook). In the end, we created a dictionary of book titles by year.

Wed 26 Oct 2016: Discussed working with JSON data (using the JSON library included in the Standard Library), then looked at working with XML data using the ElementTree parser (also included with Python). We then started to create a database table to host the CSV food inspections data (having created the foodDB database).

Thurs 27 Oct 2016: Completed designing the "inspections" table, then looked at what was involved in using the DBcm module to interact with the data in the MySQL database from Python. We are now ready to take the Food Inspections CSV file and "move" it into MySQL.

Fri 28 Oct 2016: Having successfully interacted with our MySQL table yesterday, we were able to extract the data from the CSV file and load it into MySQL using Python (being careful to ensure any date values were converted into MySQL's YYYY-MM-DD format first). See today's Notebook for all the details. NEXT CLASS: Wed. 9th November 2016 (due to conferring, etc.).

Wed 9 Nov 2016: Distributed the next coursework description (based on web scraping) as well as the 28 tips article for using Jupyter Notebook.

Thurs 10 Nov 2016: Starting exploring NumPy.

Fri 11 Nov 2016: Concluded introduction to NumPy, then started to look at pandas.

Wed 16 Nov 2016: Discussion re: the best way to go about the assignment. Looked at some resources which can help. Tutorial session on Friday to look at some of the issues in more detail.

Fri 18 Nov 2016: Worked through one possible solution to Question 1 from Assignment #2. It took a while, but we got there in the end.

Wed 23 Nov 2016: Note that the deadline for submission of coursework #2 is now 9:00am on Monday, November 28th, 2016. We revisited some of the NumPy material today (relating to wines). More NumPy, then more pandas review tomorrow.

Thurs 24 Nov 2016: Looked at Python's string formatting options, as well as did more NumPy stuff. We are back to pandas in the AM.

Fri 25 Nov 2016: We reviewed pandas, then had more fun with pandas, then introduced plotting (with pandas and with matplotlib). We also discussed Paul's project which is taking advantage of some of the technologies we are using in class.

Wed 30 Nov 2016: Ran through a "basic" analysis of the Dynomed data using Python, NumPy, and pandas (with a little bit of matplotlib thrown in for good measure).

Thurs 01 Dec 2016: More discussion about the Dynomed data. We even discussed lambda.

Fri 01 Dec 2016: One last go at Frank only to discover that Frank's sensors are all over the place: he needs to be recalibrated (ouch!). Spent some time looking at matplotlib.

Wed 07 Dec 2016: Assignment #3 distributed (this is the last piece of coursework).

Thurs 08 Dec 2016: Worked through the latest "Frank" data and notebook, which summarised the data into a single Excel spreadsheet program.

Fri 09 Dec 2016: Working on Assignment #3 begins...

Wed 14 Dec 2016: Assignment #3 continues...

Thurs 15 Dec 2016: Class rescheduled to 12:00 to coincide with Agnes' class exam.

Fri 16 Dec 2016: Methinks it's Christmas.

Return to the Courses page.