Return to Top

## Free Internet resources

The Data Science Process: What a data scientist actually does day-to-day

Dream job in Data Science with Python: "Excellent understanding of probability"

.

### Regression

- Galton's 1886 paper "Regression towards Mediocrity ..."
- Animated explanation of linear least-squares regression

### Pandas

- Pandas: Data Analysis with Python (crash course in NumPy and pandas) 's
- The official pandas crash-course: 10 Minutes to pandas
- Greg Reda's Intro to pandas data structures
- 14 Best Python Pandas Features, at Dataconomy.com
- Manish Amde's Pandas and Python: Top 10
- Python for Data Analysis - a book written by Panda's creator, Wes McKinney
- Cookbook and lessons at the pandas website

### Regular Expressions (regex) and NLTK

- Many examples, by category, at regular-expressions.info
- Web-based regex testers: here and here
- Short lessons with interactive assignments at regexone.com
- EditPad Lite - free text editor with great regex support
- Natural Language Processing with Python - book written by the developers of NLTK

### Character codes:

- Latin-1 compact table
- Latin-1 with hex and decimal codes
- All pages of Unicode
- Romanian characters

#### Data Science learning resources

- PyData youtube channel - hundreds of detailed video tutorials!
- Harvard CS 109 - Data Science has great lectures slides and lab problems!

#### Degrees in Data Science

- Data Science degrees
- Masters in Data Science
- TSU offers M.S. in Mathematical Data Mining

#### Trends in Data Science

- 2009 - The Fourth Paradigm (whole book as PDF - see the two intro pieces and Jim Gray's bio at the end)
- The Inflexion Point for DS software: pay attention to the year 2010!
- 2016 - worthy attempt to define DS (Yes, it has Venn diagrams!)
- 2017 - the year when Python overtook R in DS

### Unicode and UTF-8

- Ned Batchelder's Pragmatic Unicode (presentation)
- UTF-8 encodings table for the first 256 Unicode code-points (ASCII and Latin 1)
- All Unicode code-points in the Basic Multilingual Plane (0x0000-0xFFFF), divided into categories

### NumPy

- NumPy for MATLAB users at scypy.org
- Moving from MATLAB matrices to NumPy arrays - nice, small examples
- Tentative NumPy Tutorial at scypy.org

### Web-based Python IDEs:

- Trinket - includes Turtle graphics window!
- ideone - choose Python for Python 2.x
- Coding Ground at tutorialspoint.com
- PythonAnywhere (IPython)

### Learning Python:

- Learn Python the hard way (HTML and low-cost PDF)
- The Python Practice Book
- Learning to Program by Alan Gauld - starts simple, but cover several advanced topics (recursion, event-driven programming)
- The EU Python course (tutorial and advanced topics)
- Straight from the horse's mouth: the official Python Tutorial (HTML)
- Byte of Python (HTML and free PDF, choose version 2.x)
- From the Python Language Reference (python.org):
- Turtle module (graphics)
- The math functions
- Strings (including single, double and triple quotes)

- Lists at Dive Into Python

### IPython Notebook:

- Two tutorials at ipython.org
- R. Olson's page (tutorial and statistics examples)
- Reddit thread on philosophy of use
- Github repository of interesting notebooks, including a section on Statistics, Machine Learning and Data Science

### Matplotlib

- The official pyplot tutorial
- pyplot documentation - all methods and attributes
- List of all named colors and list of all marker styles
- Matplotlib gallery - find the plot you need and copy the code!
- Customizing Matplotlib - rcParams

### From your instructor

Homework #6 is due Fri, Nov.10

Data files:

- emails1.txt emails2.txt
- cities_distances.txt
- data_0.txt
- iris.txt
- T.S.Eliot - The Waste Land
- Thomas Hardy - The Mayor of Casterbridge
- World cities database at simplemaps.com
- Countries and continents at wikipedia.org

Code files:

**Two versions of**filling_NAN_gaps (week11 and week12), needed for the Pandas module