DATA 101 - Summer 2018

Syllabus

Official Course Description

Introduction to the use of computer based tools for the analysis of large data sets for the purpose of knowledge discovery. Students will learn to understand the Data Science process and the difference between deductive hypothesis-driven and inductive data-driven modeling. Students will have hands-on experience with various on-line analytical processing and data mining software and complete a project using real data.

Required Text

There is no requied textbook to purchase for this course, but there will be a lot of required reading and online tutorials and guides. The two online resources we will be leaning on for structure are:

Our Plan

Data science is an evolving, complex, and deep subject. Making things more interesting is that there are no prerequisites for this course (which is on purpose). So the question is: how do you eat an elephant? Answer: one bite at a time. We’ll pick very specific components of data science and study them. This is opposed the what I consider a superficial high level approach where no mastery is achieved. We’ll spend some time developing Python programming skills, we’ll spend time creating intelligent systems to predict behavior, and we’ll spend some time talking about the field in general. I’ll do my best to reduce technology related points of friction, but since this is a technical course, you can always expect some technical difficulties. If you’ve already had our introduction to programming course, then you know some Python, which is great. You’ll be asked to complete alternative assignments.

Every DATA 101 course, I search online to find a suitable real-world competition for students to compete in. We match this with an internal competition/judging with local judges from industry. It is a lot of fun. Ideally, this would line up with the end of the semester, but alas, this almost never happens.

If you have laptops, please bring them to each class. We’ll be using my personal cluster, so no need to have a powerful machine.

Course Details

Contact Information

  • Professor: Dr. Paul Anderson
  • Office: 313 HWEA
  • Office Hours: 8:30 - 9:30 TR. My preferred method of contact is e-mail. I will endeavor to respond within 48 hours.
  • E-mail: andersonpe2@cofc.edu
  • Office Phone: 843-953-8151 (I never pick this up, but it does exist :)
  • Section 02 - TR: 9:55 am - 11:10 am in HWWE 307
  • Section 03 - TR: 11:20 am - 12:35 pm in HWWE 307

Course (learning) outcomes

Overall the main learning objective of the course is to give the students some experience and knowledge of what a data scientist does.

  • To gain an overview the field of data science
  • To learn and be able to program introductory machine learning algorithms (e.g., knn, kmeans, decision tree)
  • To learn and be able to apply state-of-the-art machine learning algorithms (e.g., random forest, ensemble methods, neural networks)
  • To apply data mining, statistical inference, and machine learning algorithms to a variety of datasets, including text, image, biological, and health
  • To understand the social, ethical, and legal issues of informatics and data science
  • To learn a programming language for data science (e.g., Python and R)

Grading Policy

  • Quizzes- 40%
  • Programming Assignments - 30%
  • Project/Competition Reports - 20%
  • Homework - 10%

Grading Scale: A: 90-100; B: 80-89; C: 70-79; F: <70. Plusses will be used at the discretion of the instructor.

Grading Guidelines: Submitted work requires Analysis, Evaluation, and Creation of ideas, concepts, and materials into various deliverables (e.g., see revised Bloom’s Taxonomy and reference below).

  • The grade of A is for work that involves high-quality achievement in all three Bloom areas.
  • The grade of B is for work that involves high-quality achievement in at least two Bloom areas, and medium-level achievement in the other.
  • The grade of C is for work that involves high-quality achievement in at least one Bloom area, and medium-level achievement in the others.
  • The grade of F is for work that does not meet above criteria.

Reference: Errol Thompson, Andrew Luxton-Reilly, Jacqueline L. Whalley, Minjie Hu, and Phil Robbins. 2008. Bloom’s taxonomy for CS assessment. In Proceedings of the tenth conference on Australasian computing education - Volume 78 (ACE ‘08), Simon Hamilton and Margaret Hamilton (Eds.), Vol. 78. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 155-161.

Feedback will be given as quickly as possible with a goal of within a week of the assignment due date.

Homework Policy

No late homework will be accepted. Cheating/sharing will result in a zero on the assignment and a report to the judicial board.

Programming Assignments

There will be a combination of in-class lab assignments, and out of class programming assignments.

Honor Code

Lying, cheating, attempted cheating, and plagiarism are violations of our Honor Code that, when identified, are investigated. Each instance is examined to determine the degree of deception involved.

Incidents where the professor believes the student’s actions are clearly related more to ignorance, miscommunication, or uncertainty, can be addressed by consultation with the student. We will craft a written resolution designed to help prevent the student from repeating the error in the future. The resolution, submitted by form and signed by both the professor and the student, is forwarded to the Dean of Students and remains on file.

Cases of suspected academic dishonesty will be reported directly to the Dean of Students. A student found responsible for academic dishonesty will receive a XF in the course, indicating failure of the course due to academic dishonesty. This grade will appear on the student’s transcript for two years after which the student may petition for the X to be expunged. The student may also be placed on disciplinary probation, suspended (temporary removal) or expelled (permanent removal) from the College by the Honor Board.

It is important for students to remember that unauthorized collaboration–working together without permission– is a form of cheating. Unless a professor specifies that students can work together on an assignment and/or test, no collaboration is permitted. Other forms of cheating include possessing or using an unauthorized study aid (such as a PDA), copying from another’s exam, fabricating data, and giving unauthorized assistance.

Remember, research conducted and/or papers written for other classes cannot be used in whole or in part for any assignment in this class without obtaining prior permission from the professor.

Students can find a complete version of the Honor Code and all related processes in the Student Handbook at http://www.cofc.edu/studentaffairs/general_info/studenthandbook.html.

Disability Accomodations

Any student who feels he or she may need an accommodation based on the impact of a disability should contact me individually to discuss your specific needs. Also, please contact the College of Charleston, Center for Disability Services http://www.cofc.edu/~cds/ for additional help.

Late Policy

No late days will be allowed.