Tuesday, April 24, 2012

Hosting your html on Github

If you want to host a webpage on Github, here's how I do it.

For example, the documentation to some project you've done in, say, Python, has been generated using something like Sphinx, or you've just got some html files.. Nonetheless, you want to host an online version of this documentation as a pre-built website on github.

This is especially useful if you've made some changes to the documentation/website of a Github repository, and you want to show it easily to your peers for review, without having them first fork it and build it in order to see what you've done.
Note that these instructions are from how I do it on ubuntu 11.10.
  1. Firstly, on your Github account (Your profile page where your forks and repositories are shown, create a new repository.
  2. Give it a name, like for example, project_documentation_online_build.
  3. A set of instructions should appear, the first them being Global setup, which I'm assuming you've hopefully done already. Follow the Next steps instructions in your terminal to create your directory with the same name (in my case, project_documentation_online_build). They should go a little something like this:

    mkdir project_documentation_online_build
    cd project_documentation_online_build
    git init
    touch README
    git add README
    git commit -m'first commit'
    git remote add origin git@github.com:YourUserName/project_documentation_online_build.git
    git push -u origin master

  4. You can then create a new branch for this repository called gh-pages like so:
    git symbolic-ref HEAD refs/heads/gh-pages

  5. Switch to this branch:
    git checkout gh-pages

  6. Now you can place all your html files here, like for example your html-build from your Sphinx generated documentation. Once you've placed it all here, you can:

    git add .
    git commit -a -m "First pages commit"
    git push origin gh-pages
Once you've done this, you should receive a message from Github to inform you that your pages has been succesfully built (usually after a couple minutes).
A wee bit later you should be able to check it at http://YourUserName.github.com/project_documentation_online_build/

Note1: If you're page built but it's missing it's theme and pretty colours, add a blank file called .no_jekyll to the main directory in your gh-pages branch and try pushing it again.

Note2: This is just what works for me. If you have any problems, I recommend you check out the GitHub Pages documentation. These instructions, together with those that are displayed upon creating a new repository, are pretty much what I'm summarising here.

Friday, April 20, 2012

Informative features

Let's say you want to generate a synthetic data-set to play around with for classification,
and you set

n_samples = 100
n_features = 1000



and you generate the following data
import numpy as np
import matplotlib.pyplot as plt
X1 = np.asarray(np.randn(n_samples/2, n_features))
X2 = np.asarray(np.randn(n_samples/2, n_features)) + 5
X = np.append(X1, X2, axis=0)
rnd.shuffle(X)

plt.scatter(X[:,0], X[:,1])
plt.show()




For a binary classification, the function which determines our labels is \[y = sign(X \bullet \omega)\]
Where \(\omega\) is our coefficients.
For now, let's set our coefficients equal to a bunch of zeros:
coef = (np.zeros(n_features))
If we wish to make it so that we have, say, 10 informative features, we can for example set 10 of our coefficients equal to a non-zero value. Now when we dot it with our data, X, we will basically
tell it that the 10 non-zero coefficients are our informative features, while the rest that will be
multiplied by zeros are not informative.

So,

coef[:10] = 1
y = np.sign(np.dot(X,coef))


will give us our corresponding labels such that we have 10 informative features.
A way to visualise this, is to use the Scikit-Learn package's f_classif function.
If you have the Scikit-learn package installed, do the following:

from sklearn.feature_selection import f_classif
p,v = f_classif(X,y)
plt.plot(p)
plt.show()



Here you can see that the first 10 features are rated as the most informative.