Paul Barry's Website - Programming for Data Scientists - Semester 1

Module on the M.Sc. in Data Science

M.Sc. in Data Science: Programming for Data Scientists - Semester 1

Academic Year 2019/20 - Class Log.

Week 1: Wed Sept. 25th: Welcome! Introduced the module.

Thurs Sept. 26th: Introduced Jupyter Notebooks with some simple Python code.

Fri Sept. 27th: Continued to work through the odd example, discussed Visual Studio Code, too. See the emails sent to date (there should be five of them in your in-box). For next Wednesday, install Linux, then Anaconda and GIT on your computer.

Week 2: Wed Oct. 2nd: We learned about the three programming fundamentals: sequence, selection, and iteration.

Thurs Oct. 3rd: Finished off the odds example, and starting looking at how Python handles data.

Fri Oct. 4th: A small update to the Strings notebook, as well as a big chunk on Lists (with the exception of sorting, which we'll get to next week).

Week 3: Wed Oct. 9th: AM: finished off our discussion with lists by talking about sorting lists with different types of data contained therein. PM: talked about when not to use lists, which was our introduction to Python dictionaries.

Thurs Oct. 10th: Concluded our introduction to dictionaries, then did a bit of sets, and tuples. More tomorrow AM when we return to the weather data once more.

Fri Oct. 11th: The Zipping notebook concluded with code to create a list-of-lists and a list-of-dictionaries from data in a file. Having taken the time to write all the code "by hand", we then rewrote it in two lines thanks to Python's csv module. We also took our first look at list and dictionary comprehensions (which are COOL).

Week 4: Wed Oct. 16th: AM: Watched the first 17 minutes of David Beazley's PyData talk as an introdution to Assignment 1. PM: a rather rushed (and chaotic) introduction to the DBcm module, which we used to successfully store some weather data in a local MariaDB database.

Thurs Oct. 17th: Having talked about CSV already, we looked at other formats including PDF (yuk!), XML (yuk!), and JSON (nice!).

Fri Oct. 18th: Looked at the JSON manipulating code notebook, which converts a list of differing-sized dictionaries into a list of fixed-sized dictionaries. Also, took a quick look at the module which provides for (easy) reading of data from Excel spreadsheets.

Week 5: Wed Oct. 23rd: AM: We had a quick review of HTML, concentrating on the structure of HTML tags, with particular emphasis on tables. PM: Created two web scraping notebooks which extracted data from Paul's sample webpage, as well as WikiPedia's James Bond page.

Thurs Oct. 24th: Worked through the rowspan information associated with the James Bond data, and ended-up with all 25 rows of data stored in a database table, with four pieces of data per row.

Fri Oct. 25th: This morning, we cheated at crosswords.

Week 6: Wed Oct. 30th to Nov 1st: No classes due to Conferring (which means there's plenty of time to work on the first assignment, which is due at 5:00pm on Nov 1st).

Week 7: Wed Nov. 6th: AM: Introduced Pandas with the Bond data. PM: Continued to work with Pandas using the Weather.CSV data. Additionally, we introduced Pandas Profiling.

Thurs Nov. 7th: More Pandas... with clean-ups and column manipulations, as well as conversions to other external formats (e.g., CSV, JSON, XLSX, and HTML).

Fri Nov. 8th: Even more Pandas.

Week 8: Wed Nov. 13th: AM: Introduced the idea of Tidy Data. PM: Started looking at visualisations, with a look at pandas plots as well as our first look at Altair.

Thurs Nov. 14th: No classes due to Open Day.

Fri Nov. 15th: More Altair.

Week 9: Wed Nov. 20th: AM: More Altair (concentrating on the examples in the docs), and we started the city-depot-location case study. PM: With the data prepared in the first class (this AM), we used the prepared data to produce the visualisation we needed with Altair, concluding the case study.

Thurs Nov. 21st: Reviewed progress with Indicative Content, and discussed web scraping issues with Assignemnt 2.

Fri Nov. 22nd: Everyone is working on assignments.

Week 10: Wed Nov. 27th: AM and PM: Tutorial session on assignment 2, web scraping. Back to normally scheduled activities tomorrow.

Thurs Nov. 28th: A very simple web server (http.server) for serving up anything from anywhere, then we looked at using Flask for the same thing. Once we had our Flask app working, we deployed it to PythonAnywhere (in abuot 10 minutes).

Fri Nov. 29th: We previewed the really important ESB data.

Week 11: Wed Dec. 4th: AM: Distributed and discussed the final assignment. PM: Worked through the "data un-tidying" exercise using Kaggle's stock prices data.

Thurs Dec. 5th: Devised a plan to build a dynamic webapp to publish the stock prices data, then started to work on it. First problem: generating the "big" list of checkboxes to allow the users of our webapp to select which stocks to report on.

Fri Dec. 6th: Built the first iteration of our webapp, which processes the checkbox data.

Week 12: Wed Dec. 11th: AM: Continued to work on the webapp. PM: Added the code to allow for the display of any "untidied" DataFrame.

Thurs Dec. 12th: Extended the webapp once more to display the visualisation. This concludes the webapp. This concludes our work on this webapp.

Fri Dec. 13th: Last class before Christmas break. Final assignment is now active and due January 10th (Friday) 2020.

Return to the Courses page.