Introduction to Data Science

Join the data revolution. Companies are searching for data scientists. This specialized field demands multiple skills not easy to obtain through conventional curricula. Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. #uwdatasci

About The Course

Commerce and research are being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula. Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).

Frequently Asked Questions

Will I get a Statement of Accomplishment after completing this class? 

Yes. Students who successfully complete the class will receive a Statement of Accomplishment signed by the instructor. 

What resources will I need for this class?

For this course, you will need an Internet connection and either a) the ability to run a virtual machine locally or b) the ability and knowledge to install the appropriate software yourself.  The software will include Python 2.7 (including various libraries), R, SQLite (or another database you are comfortable using).  You will also have the opportunity to install and work with Hadoop, but for logistics reasons, we will not require its use in an assignment.  Some assignments will be open-ended.

What level of programming experience should I have?

We expect intermediate programming experience in some language and some familiarity with database concepts.  There will be programming assignments, but these are not designed to test knowledge of the language itself and will not involve using any esoteric features.  The languages we will use are Python, R, and SQL.

Recommended Background

We expect you to have intermediate programming experience and familiarity with databases, roughly equivalent to two college courses.  We will have four programming assignments: two in Python, one in SQL, and one in R. The target audience is undergraduate students across disciplines who wish to build proficiency working with large datasets and a range of tools to perform predictive analytics.

After taking this course, you may be interested in participating in the three-course Certificate in Data Science offered through the University of Washington Professional and Continuing Education program.  This online course will provide an overview and introduction to the more extensive material covered in that program, which offers classroom-based instruction by data scientists from Microsoft and other Seattle players, networking opportunities with peers, case studies from the "front lines," and deep dives into selected topics.