Web Scraping Tutorial

The first of a 4-part tutorial on the joys of web scraping. It starts off a little slow, it is true: But while it may not at first seem likely to amount to much, you’ll change your tune when you see what we do with this scraping procedure in sections 3 and 4.

For the time being though:  Check out the tutorial in a Jupyter Notebook.

Text file tutorial

Code is Hard

I’m sure that some of you feel overwhelmed as you work to master these weird, radically new ideas. It is OK to feel that way — just keep chipping away at these fundamentals: Pretty soon some of it will start to feel familiar.
The Slippery Sir Hiss
Not that it will ever feel like home: This technology is evolving too quickly for that. But it doesn’t have to: I’ll never ask you to write code from memory or invent code on-the-spot: That’s not what we’re trying to accomplish here. You will always have access to the references and resources you’ve come to trust; you’ll always be able to lean on others in order to find your answers. This is a class about research, not about programming. In the long-term, I do think that the code we write has the potential to open up, in each of us, new vectors of reason and novel types of critique. But for now, refuse to let yourself be intimidated by it.

The link below will take you to a tutorial that picks up on what we did not finish on Thursday. It does a reasonable job of explaining how to open and read data files in Jupyter. I’ll post a few more of these this afternoon (Sunday, 11 February).

binder

The link itself is actually very interesting to us: I’m using a Jupyter notebook hosting service called “myBinder.” It’s new to me, and still in Beta, so there are some kinks to work out I’m sure, but it makes it so very easy to share your work with your peers as an interactive, live notebook. I just put my .ipynb file on Github and tell Binder to host.

Bonus “It’s a Small World” detail: Guess what happens in the interim? Right: myBinder fires up Docker, and builds your notebook into a custom Docker container that is then launched on demand. See? It all comes full circle!

Tutorial

Work through the tutorial to learn the basics of loading a data file. Sometimes it will seem overly complicated — but don’t be discouraged: Once we get these first principles out of the way, the rest will come much easier. And we can actually focus on the research and our arguments themselves, rather than on Python.

For the nonce, though: IO_Tutorial_One