Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2018/19 - Class Log.


Wed 26 Sep 2018: Welcome! We found out that Linux - specifically Ubuntu (or Xubuntu if your laptop is a little older) - is the recommended platform for this course. Anaconda for Python 3 is the recommended Python distribution. Paul discussed the indicative content, installed Anaconda onto his Linux, and demonstrated a very simple Jupyter Notebook. He also talked a lot (probaby too much). Tomorrow, we'll all start to learn Python programming in anger.

Thurs 27 Sep 2018: After a shakey start, we managed to connect to the Internet and start our Python travels. We learned about numnbers (int), words (string), and truth values (boolean), then started to discuss the Three S's: sequence, iteration, and selection.

Fri 28 Sep 2018: Continued to learn about the Three S's... with more on iteration, and then selection. See today's notebook for all the details.

Wed 03 Oct 2018: Learned about input-processing-output, taking data from input sources (keyboard, files) and processing within Python.

Thurs 04 Oct 2018: We started our deep dive into Python's ability to store data of all types: numbers (int, float), strings (str), and booleans (bool). Then we took a long, hard look at lists... ending when we uncovered a use-case which doesn't suit them. This lead into an introduction to dictionaries (which we will continue to explore in the AM). And - lest we forget - we did look at the wonderful sorted() function, too.

Fri 05 Oct 2018: Did a deep dive into Python's dictionary technology, and also looked at frequency counting (as well as the collections module's Counter).

Thurs 11 Oct 2018: Today was all about sets, tuples, and superheroes.

Fri 12 Oct 2018: Distributed and discussed first chunk of coursework.

Wed 17 Oct 2018: Learned how to abstract code into a function (and a module) so that it can be shared more easily.

Thurs 18 Oct 2018: We discussed the CSV, JSON, and XML data formats (spending some time on the DATA.GOV.IE website). Then we learned enough HTML to be just dangerous enough to think about extracting drug names from the web pages on the NCPE.IE website.

Fri 19 Oct 2018: Created our first web scrapping system, which grabs all the drug names from the NCPE website.

Wed 24 Oct 2018: Installed/configured MariaDB to store the data scraped from the web, then used DBcm to access the created database table from our notebooks.

Thurs 25 Oct 2018: Continued our exploration of NCPE's drug data, first extracting the drug trade names, then looking at the dates on each of the pages. This produces some interesting results (especially after we installed datefinder).

Fri 26 Oct 2018: Returned to the issue of extracting the trade name from the H1 full heading, and created a small function to do the leg-work. It works about 95% of the time... now to make it work for the other 5%.

Wed 31 Oct 2018: Revisited the extract_trade_name function one last time, then took a look at Harry's code... and learned about the black code formatter.

Thurs 01 Nov 2018: Conferring.

Fri 02 Nov 2018: Conferring.

Wed 07 Nov 2018: Coursework #2 distributed.

Thurs 08 Nov 2018: We scraped the James Bond wikipedia page.

Fri 09 Nov 2018: We took a look at a solution to Coursework #1 from last year.

Wed 14 Nov 2018: We started to look at (and learn about) numpy.

Thurs 15 Nov 2018: Open Day (Carlow).

Fri 16 Nov 2018: Intro to pandas.

Wed 21 Nov 2018: More requests/pandas practice with the NCPE and EMA data.

Thurs 22 Nov 2018: More pandas (with Paul not having a good day). We ran into problems selecting the data we need. We also looked at extracting scraped web data from a local database into a pandas dataframe, and then an Excel spreadsheet.

Fri 23 Nov 2018: Finished off the EMA notebook (with corrections), then started to talk about the weather.

Wed 28 Nov 2018: No class, just a chat, as the weather was against us (and we had a 2-hour power outage).

Thurs 29 Nov 2018: Ran through Q1 for the the second coursework, then looked at the weather scraping and display notebooks. First exposure to plots.

Fri 30 Nov 2018: More weather (and more plots).

Wed 05 Dec 2018: Today, we "altaired" the weather.

Thurs 06 Dec 2018: NLTK and wordclouds.

Fri 07 Dec 2018: Prep-work for publishing visualisations on the web, as well as the creation of a simple webapp written in Python's Flask.

Wed 12 Dec 2018: No class, as Philip had presentations.

Thurs 13 Dec 2018: Completed the first version of our webapp which integrates the display of the raw data as well as some Altair visuals. Even went as far as deploying the webapp to PythonAnywhere (on the cloud).

Fri 14 Dec 2018: Last day of term. Happy Holidays!


Return to the Courses page.