Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2018/19 - Class Log.

Wed 26 Sep 2018: Welcome! We found out that Linux - specifically Ubuntu (or Xubuntu if your laptop is a little older) - is the recommended platform for this course. Anaconda for Python 3 is the recommended Python distribution. Paul discussed the indicative content, installed Anaconda onto his Linux, and demonstrated a very simple Jupyter Notebook. He also talked a lot (probaby too much). Tomorrow, we'll all start to learn Python programming in anger.

Thurs 27 Sep 2018: After a shakey start, we managed to connect to the Internet and start our Python travels. We learned about numnbers (int), words (string), and truth values (boolean), then started to discuss the Three S's: sequence, iteration, and selection.

Fri 28 Sep 2018: Continued to learn about the Three S's... with more on iteration, and then selection. See today's notebook for all the details.

Wed 03 Oct 2018: Learned about input-processing-output, taking data from input sources (keyboard, files) and processing within Python.

Thurs 04 Oct 2018: We started our deep dive into Python's ability to store data of all types: numbers (int, float), strings (str), and booleans (bool). Then we took a long, hard look at lists... ending when we uncovered a use-case which doesn't suit them. This lead into an introduction to dictionaries (which we will continue to explore in the AM). And - lest we forget - we did look at the wonderful sorted() function, too.

Fri 05 Oct 2018: Did a deep dive into Python's dictionary technology, and also looked at frequency counting (as well as the collections module's Counter).

Thurs 11 Oct 2018: Today was all about sets, tuples, and superheroes.

Fri 12 Oct 2018: Distributed and discussed first chunk of coursework.

Wed 17 Oct 2018: Learned how to abstract code into a function (and a module) so that it can be shared more easily.

Thurs 18 Oct 2018: We discussed the CSV, JSON, and XML data formats (spending some time on the DATA.GOV.IE website). Then we learned enough HTML to be just dangerous enough to think about extracting drug names from the web pages on the NCPE.IE website.

Fri 19 Oct 2018: Created our first web scrapping system, which grabs all the drug names from the NCPE website.

Wed 24 Oct 2018: Installed/configured MariaDB to store the data scraped from the web, then used DBcm to access the created database table from our notebooks.

Thurs 25 Oct 2018: Continued our exploration of NCPE's drug data, first extracting the drug trade names, then looking at the dates on each of the pages. This produces some interesting results (especially after we installed datefinder).

Fri 26 Oct 2018: Returned to the issue of extracting the trade name from the H1 full heading, and created a small function to do the leg-work. It works about 95% of the time... now to make it work for the other 5%.

Wed 31 Oct 2018: Revisited the extract_trade_name function one last time, then took a look at Harry's code... and learned about the black code formatter.

Thurs 01 Nov 2018: Conferring.

Fri 02 Nov 2018: Conferring.

Wed 07 Nov 2018: Coursework #2 distributed.

Thurs 08 Nov 2018: We scraped the James Bond wikipedia page.

Fri 09 Nov 2018: We took a look at an solution to Coursework #2 from last year.

Wed 14 Nov 2018: We started to look at (and learn about) numpy.

Thurs 15 Nov 2018: Open Day (Carlow).

Fri 16 Nov 2018: Intro to pandas.

Return to the Courses page.