Coursera Machine Learning In Python (Exercise 1)

< !DOCTYPE html>

I have previously done the Coursera Machine Learning exercises in Matlab. I thought, now that I am starting to get away from Matlab and use Python more, I should re-do the exercises in Python. This is exercise 1.

In [447]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy

Part 1: Create an eye matrix. While this is incredibly simple, I want to make sure that I go through each step and provide a resulting document that a novice can follow long and understand what is happening.


Scrape Keywords from Job Postings

Job Posting Crawler

This is code that will pull each job posting for a specific job title in a specific location (or Nationally) and return / plot the percentage of the postings that have certain keywords. The code is set up to search for all words except stopwords, and other user-defined words (there is probably a much more efficient way of doing this, but I had no need to change this once I had the code running). This allows the user to see common technical skills, as well as common soft skills that should be included on a resume.

NOTE: I got this idea from Obviously, just using his code would be of no real benefit to me, as I wanted to use the idea to help better my skills with scraping data from HTML files. So, I used his idea and developed my own code from scratch. I also modified the overall process a bit to better fit my needs.

NOTE2: This code will not be able to identify multiple-word skills. So, for example, ‘machine learning’ will show up as either ‘machine’ or ‘learning’. However, ‘machine’ could show up for other phrases than ‘machine learning’.

To run the code, change the city, state, and job title to whichever you wish. After generating the plot, you might need to add ‘keywords’ to the attitional_stop_words list if you do not want them to be included.
Continue reading Scrape Keywords from Job Postings


Backup with rsync using SSH Tunneling

For those of you that read my blog often, you know that I admin the cluster that our research group uses here at CU Boulder.  Because of this, I get a lot of questions from users who don’t want to take the time to solve their own problems.  Fairly recently, our RAID-6 crashed (we had a 4th drive die and had to rebuild the array).  Normally this wouldn’t be very much of a problem as most of the files saved on our storage drive are just input files that we can re-download from a separate server, or so I thought.  Personally, all my source code is in my home folder, backed up on our data server, and backed up onto my personal laptop.  For researchers in our group who are developing code, not having a backup of source code can lead to many many months of lost work.  Well, as it turns out, many of the people in our group had their source code on our data server (the one that crashed), without a backup anywhere.  So months of work had been lost.  Well, after the rebuild I have gotten many questions on how to set up an ssh tunnel so that they can backup from our cluster, through the front end, to their home computer.

Continue reading Backup with rsync using SSH Tunneling


Quit after closing last window

I use on a daily basis, mostly for my research as I am currently building an adjoint of the CMAQ model.  I like to have 4+ windows open at all times while working in Terminal, as it allows me to watch the progress of a simulation while also editing files, compiling, etc.  I have macros setup that will log me into and out of each terminal window.  One of the frustrating things that I ran into was that, after the macro logged me out of the final window, I wanted the Terminal application to quit.  Now, I very easily could have added the Quit Application command to the macro, however I was hoping for a solution that would also work even if the macro wasn’t called.  I surfed the web for a while and found nothing, until I got a response on my post on Apple’s forums.

I am copying te response to this website, however all credit for the material goes to François J. Perreault, who answered the question.

Here’s how to have Terminal quit automatically after closing all your shells:

  1. Create a new text file in your home folder named “autoQuitTerminal.scpt“:
    tell application "Terminal"
      --If there is only one tab remaining, and it contains 
      --the word "logout" then this is the final window
      if (count of (tabs of (every window whose visible is true))) = 1 then
                set theContents to words of ((contents of tab 1 of window 1) as Unicode text)
                set exitLastTab to (theContents contains "logout")
           on error
                set exitLastTab to false
           end try
           if exitLastTab is true then
           end if
       else if (count of (tabs of (every window whose visible is true))) < 1 then
      --If no window remains open, then obviously we can quit the app.
      --This would occur when the final window is closed without ‘exit’
       end if
    end tell

    Continue reading Quit after closing last window

Add Infiniband interface to ifconfig

We recently had an issue where we had to rebuild our RAID-6 array.  After rebuilding the array, our cluster did not automatically locate and mount our high-capacity storage array.  In order to fix this problem, we had to add a new interface configuration file to ifconfig by following the below steps:

1. As root on the server that will be connected to the high-capacity storage server

vi /etc/sysconfig/network-scripts/ifcfg-ib0

Add the following to the file:


Of course your BROADCAST, IPADDR, and NETMASK will be different from those set here.
Some Notes:

  • The filename is ifcfg-ib0 for the configuration file for device ib0 (note these are zeros, not the letter o).
  • BROADCAST  is the broadcast IP address.
  • NETMASK is the netmask IP value
  • BOOTPROTO is the boot protocol, where the value is one of the following: (a) none – No boot-time protocol should be used, (b) dhcp – The dhcp protocol should be used, (c) static – static hard set the IP.
  • IPADDR is the IP address
  • ONBOOT specifies if the interface needs to be active on boot (values: yes or no)
  • TYPE is the interface type