CSIS 638 - Summer 2018

Schedule

Important: Everything listed under a date is due that day by midnight anywhere on earth (https://www.timeanddate.com/time/zones/aoe).

7/5

Introduction and setup

  1. Introduction Presentation
  2. Course General Slides for Comments and Discussion: https://docs.google.com/presentation/d/1Q1oo8du-qWbBxAHVaGuJorjb0h8OLfwcnc2NUYpWmVw/edit?usp=sharing
  3. Fill out the following contact information form:https://goo.gl/forms/yOC8ZixAmt4Ue7py2
  4. Create a shared folder named lastname_firstname_csis_638 and share it with full modify privileges to pauleanderson@gmail.com. For example, my folder would be called Anderson_Paul_csis_638.

7/6

B-Trees

  1. Create a copy of the following presentation in lastname_firstname_csis_638 folder: https://docs.google.com/presentation/d/1vuKgUqbi6AC7syaDgcdquxGTS1fhAsunb1jxLoiR_Ig/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

7/9

Serializability in Databases

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1Nma61WxG8b3ohjO3FAWwNfU6nrawFeVZdhdguTk2YDI/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

B-Tree Programming Assignment Nodes

More notes that should help with the assignment: https://docs.google.com/presentation/d/1RKpdSKFq0eoESs6CcaNW4TCFbsA4Ur3IHFEItziSHMc/edit?usp=sharing

B-Tree Programming Assignment Part 1

  1. The overall assignment is complete a pure Python implementation of a B-Tree by Friday at midnight. We need a warning right away. As with most things, you can definitely find code online to help you. Do NOT use this code at all. If I suspect anyone of “cheating” on the programming assignments, I’ll just quietly compile what evidence I see and submit it to CofC honor board which will then review. Nothing personal here, it’s just the only way to deal with programming assignments. OK. With that out of the way, the point of the assignment is to learn more about B-Trees. Making steady progress but not arriving at a 100% solution will still get an OK grade, so there really is no reason to cheat. Each day of the week, you’ll be asked to make a little more progress on the assignment and push you code to a GitHub repo that I can look at the next day, give you feedback, tips, hints, etc. There are only 6 students, so this is ideal and awesome.
  2. By the end of the day, please create a GitHub username (if you don’t already have one), and fill out this form with that information: https://goo.gl/forms/YGyyd5NfGaitfbN82.
  3. Download and configure a Python3.6 development environment of your choosing. It can’t be IDLE… We are in graduate school. Here is a list: https://wiki.python.org/moin/IntegratedDevelopmentEnvironments.
  4. If you aren’t familiar with Python, complete one of the many online introductions to the language, such as https://developers.google.com/edu/python/. Nothing to turn in here, just make sure you’re armed with a little bit of Python exposure. It’s an easy language. I would definitely recommend practicing with PDB.

7/10

Concurrency Control

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1i8ruefyu8O-GKctPvjtnU77VQGbruUqrZ2-52SecQPU/edit?usp=sharing.
  2. Complete the readings, prompts, and questions in the presentation.

B-Tree Programming Assignment Part 2

I’ll share a repo with your GitHub account called btree_lastname_firstname. By the end of the day, I want you to have pushed a Python file and some progress on the overall goal of creating a btree class. If your pushed coded doesn’t run, 0/5 points. If your code runs, but you really haven’t added any functionality (1/5 points). A little functionality has been added, but it’s clear that it took minutes to do so 2/5 points. If it looks like you’ve put in some solid work on adding code (5/5 points). If you get the entire assignment done, you’re work on the programming assignment for this week is complete. Each time you push the code, you must write a good commit message that describes what you’ve done in detail and complete sentences. If you don’t, 1 point will be subtracted from your score. Further, if you don’t comment your code, 1 point will be subtracted from your score.

7/11

Sequential Indexing

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1Qoe4RraW7YyQHaa0XiVCL50VuZPBv1QHNyGljJufjME/edit?usp=sharing.
  2. Complete the readings, prompts, and questions in the presentation.

B-Tree Programming Assignment Part 3

By the end of the day, I want you to have pushed an update with some progress on the overall goal of creating a btree class. If your pushed coded doesn’t run, 0/5 points. If your code runs, but you really haven’t added any functionality (1/5 points). A little functionality has been added, but it’s clear that it took minutes to do so 2/5 points. If it looks like you’ve put in some solid work on adding code (5/5 points). If you get the entire assignment done, you’re work on the programming assignment for this week is complete. Each time you push the code, you must write a good commit message that describes what you’ve done in detail and complete sentences. If you don’t, 1 point will be subtracted from your score. Further, if you don’t comment your code, 1 point will be subtracted from your score.

7/12

Distributed Hash Table

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1I_r_PyVbABKSOE2tUnU34YUEumx8i53bPPh-VP8npO4/edit?usp=sharing.
  2. Complete the readings, prompts, and questions in the presentation.

B-Tree Programming Assignment Part 4

By the end of the day, I want you to have pushed an update with some progress on the overall goal of creating a btree class. If your pushed coded doesn’t run, 0/5 points. If your code runs, but you really haven’t added any functionality (1/5 points). A little functionality has been added, but it’s clear that it took minutes to do so 2/5 points. If it looks like you’ve put in some solid work on adding code (5/5 points). If you get the entire assignment done, you’re work on the programming assignment for this week is complete. Each time you push the code, you must write a good commit message that describes what you’ve done in detail and complete sentences. If you don’t, 1 point will be subtracted from your score. Further, if you don’t comment your code, 1 point will be subtracted from your score.

7/13

Capacity Planning and Configuration

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/16OKcmdkauQlIgiSGZ7HBOjTMxp0i1-oFMgpufcTqBeQ/edit?usp=sharing.
  2. Complete the readings, prompts, and questions in the presentation.

7/15

B-Tree Programming Assignment Part 5

By the end of the day, I want you to have pushed an update with a working btree implementation (10 points). If your pushed coded doesn’t run, 0/10 points. If your code runs, but you really haven’t added any functionality (1/10 points). A little functionality has been added, but it’s clear that it took minutes to do so 2/10 points. If it looks like you’ve put in some solid work on adding code (5/10 points). If you get the entire assignment done, you’re work on the programming assignment for this week is complete (10/10). Each time you push the code, you must write a good commit message that describes what you’ve done in detail and complete sentences. If you don’t, 1 point will be subtracted from your score. Further, if you don’t comment your code, 1 point will be subtracted from your score.

7/16

Nothing due

7/17

Nothing due

7/18

Distributed Databases

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1zpdMCY_jkyqQ-RNcWKXou9Dm5zhCY-ZgCVZXASV73HI/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

7/19

Google BigTable

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/18U3JghMeqeen_KNnvzHkKmYL1ibS1VLgWOtDA-uIlS4/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

7/20

Cassandra

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/1wC53lMd6BIIDnw4aixZhihYP77gGBt-EhZODe63S_Io/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

7/22

Cloud Test Environment

Our main goal this weekend is for everyone to deploy at least three virtual machines on Microsoft Azure Cloud and make sure they can all see each other. Your main task over the weekend is to familarize yourself with their system. They provide free (no credit card needed) accounts for students. I found their interface pretty straightfoward, but I’ve been doing this a long time, so I’m giving you the entire weekend to make sure you can deploy 3 VMs and have them all communicate with each other, and you can access them via ssh. Here are some links to help you along the way.

  1. Start by getting your free $100 credit: https://azure.microsoft.com/en-us/free/students/.
  2. Make sure you know about ssh private/public keys: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/mac-create-ssh-keys.
  3. I called my resource group CSIS638.
  4. You’ll need to copy and paste your public key when creating your VM.
  5. Make sure you create a Ubuntu 17.10. Other OS versions will of course work, but life will be easier if we are all on the same platform.
  6. I went with the cheapest model for the VMs. Our goal is to get experience and not great performance.
  7. You’ve got to enable SSH as your public inbound port (22).

What to turn in:

  1. Create a copy of the following presentation in your folder: https://docs.google.com/presentation/d/15Ory2-5Ak11MPd_W1WYBGvObIYjI2OcfUnNEf-xKh1Q/edit?usp=sharing
  2. Complete the readings, prompts, and questions in the presentation.

7/23

  1. Read the base description
  2. Make sure you can see and ask questions on https://docs.google.com/presentation/d/1PCWkrlzfuZ30mRaonaDszgGXxzEhS0RjdUgAlSH6B1o/edit?usp=sharing. I’ve shared this with your gmail accounts.
  3. Get Cassandra installed on your Azure VMs and read through the documentation http://cassandra.apache.org/download/. Nothing to turn in for this day and step.
  4. Just a note that we’ll be using your btree GitHub repos. We should have called them something generic, but we’ll just ignore the bad name. You can move things around in there as you like but don’t change the name.

7/24

  1. Install, configure, and run Cassandra on all three of your virtual machines. Upload screenshots and any notes you have in your GitHub repository. All three nodes should see each other. (10 points)
  2. Pick one of the concepts outlined in the base description and engineer a storage solution with Cassandra. You must document your decisions as everyone’s decisions can be different. You must make an attempt to script everything so the system can be replicated. In other words, you should always strive to have a repository that can be pulled in by a user and deployed using only commands. You should have at least one example and tests showing how this portion of the system will be used. (20 points)

7/25

Pick another one of the concepts outlined in the base description and engineer a storage solution with Cassandra. You must document your decisions as everyone’s decisions can be different. You must make an attempt to script everything so the system can be replicated. In other words, you should always strive to have a repository that can be pulled in by a user and deployed using only commands. You should have at least one example and tests showing how this portion of the system will be used. (20 points)

7/26

Pick another one of the concepts outlined in the base description and engineer a storage solution with Cassandra. You must document your decisions as everyone’s decisions can be different. You must make an attempt to script everything so the system can be replicated. In other words, you should always strive to have a repository that can be pulled in by a user and deployed using only commands. You should have at least one example and tests showing how this portion of the system will be used. (20 points)

7/27

Pick another one of the concepts outlined in the base description and engineer a storage solution with Cassandra. You must document your decisions as everyone’s decisions can be different. You must make an attempt to script everything so the system can be replicated. In other words, you should always strive to have a repository that can be pulled in by a user and deployed using only commands. You should have at least one example and tests showing how this portion of the system will be used. (20 points)

7/30

This is our final class and your final data science database system is due on this day. It should include documentation of your design choices as well as a functioning system with documentation (30 points).