See SoN-2014-Projects for project descriptions.
Contents
- Reconstruction
- Benchmarking and Visualizing HTMs
- Image Interpretation
- Insights into the CLA
- Spatial Pooler OCR
- Simple AI for Games
- Epilepsy Seizure Prediction
Reconstruction
- Students:
- Brendan Berman (United States)
- Ross Story
- Mentor: Marek Otahal
Repository
https://github.com/Random-Word/nupic
Progress
In the first week I laid out the test script for reconstruction and assessed how much of the required functionality was already present. Only one function — the SP function for conversion from an SDR to an encoder bit string — was missing, and that’s where I ended up stalled.
Completion
Completion of the project shouldn’t be a problem if I find time to properly work out the best way to go backwards at that step. Once that’s complete reconstruction will work and the final step will be finding out where in the source the other evaluation metrics are and including it with the rest, plus adding any relevant unit tests. I might also include a standardized function in each encoder with a properly chosen distance metric for computing error values appropriate to the input type.
Blockers
The only issue preventing progress is the aforementioned puzzle. I just need to find the time to solve it.
Benchmarking and Visualizing HTMs
Full Title: Benchmarking and Visualizing HTMs in JS on MNIST dataset by exploring hyper-parameter space
- Student: Curtis SerVaas (United States)
- Mentor: Ian Danforth
NO MID-TERM REPORT RECEIVED FROM STUDENT OR MENTOR. |
Image Interpretation
- Student: Steven Karapetyan (USA)
- Mentor: Fergal Byrne
NO MID-TERM REPORT RECEIVED FROM STUDENT OR MENTOR. |
Insights Into The CLA
- Student: Ruaridh O’Donnell (Scotland, UK)
- Mentor: Chetan Surpur
Project File Locations
The documentation for the project will be hosted on github as well (in the attached wiki or through IPython notebooks).
Current Project Status
This project had a delayed start due to Ruaridh being busy with exams but currently it is underway. It’s still in its early stages but the plan has been refined and work is underway. The plan for the project is slightly different from the original description. The focus is now on creating a library of visualisations for use with NuPIC.
Will Everything Be Finished In Time?
Although this project started later it should still achieve things. There is not any precise amount of work that has to be done to call the project completed. The more time there is the more visualisation can be written. So by August there should be something useful.
Barriers To Progress
Currently there are no obvious barriers to progress.
Spatial Pooler OCR
- Student: Jim Bridgewater (United States)
- Mentor: Scott Purdy
Progress
Jim recently posted his own mid-term report to the NuPIC discuss list. That includes an overview of his work so far. Jim has familiarized himself with some of what is available in NuPIC and started to get a simple vision experiment set up. He has worked very independently and picked up the work very quickly. Overall the work is very promising.
Concerns
There are a few concerns that I have raised with him previously but I don’t think they reflect any problem with progress on the core task. Specifically, the model size hasn’t reached the “SDR” level. He was working off Ian’s first SP demo that only had eight columns. The most recent report was using 62 columns, but that still isn’t enough to really achieve the properties of SDRs. Further, with only one column active at a time, the model was likely acting more as a KNN classifier.
But I believe the project is well on its way and we should have a good start point for vision experimentation by the end of the summer.
Next Steps
Jim has laid out some of his next steps, which I reiterate and expand on here:
- Increase the model to 2048 columns with 40 active (for now, we can experiment with other numbers in the future but I wouldn’t do smaller than this for now).
- Use the KNNClassifier from NuPIC for training and classifying images.
- Use a performance metric to determine when the SP has stopped learning instead of a hash, which is not guaranteed to converge and is useless for determining incremental improvement.
- Build onion charts for evaluating classification tolerance to different types of transformations of images.
- Use the network API instead of directly instantiating the SP. This will be useful for future experimentation.
Simple AI for Games
- Student: Fernando Martinez (United States)
- Mentor: Matt Keith
See Fernando’s report on Google Docs.
Epilepsy Seizure Prediction
- Students:
- Anubhav Chaturvedi (India)
- Kaggle Team
- Mentor: Matt Taylor
Source Code
This exists within a private repository at https://github.com/fergalbyrne/nupic.kaggle-eeg because of Kaggle Rules.
Project Structure Summary
This SoN project was incorporated into a team for a Kaggle Competition. See the Epilepsy Seizure Prediction wiki page for details on the team.
This team is having regular planning meetings over Google Hangouts and Campfire. Project planning is being managed by Anubhav on Trello.
Project Progress
Much of the work has been on understanding the EEG input data, massaging it into the proper formats for NuPIC, and swarming over the data in an attempt to get decent model parameters. Research papers have been posted for the group to review on the subject of EEG, and Anubhav has created two blog posts on relevant subjects:
The team is using the pyeeg
library for pre-processing of the EEG data.
Summary
I think the team is making progress, but the crucial period is coming where they need to find the process in which they’ll be creating results. Once model parameters are identified that are suitable for the input data, a repeatable process should be established that can be easily run within the project source code to identify anomalies in the EEG data.
Team Member Reports
From Anubhav
I have worked and pushed codes to concatenate data to single csv files, find min and max values in multiple files combined and generate all the statistical parameters for a given patient. I have also looked into the working of EEG and am currently working on using pyeeg library. I have also posted the research papers i referred to on campfire and created two blog posts regarding this project.
From Ross:
I made a small update to the data processing pipeline to include a few statistical features found useful in the literature (MAD, IQR) as well as a column for class label and accurate per-sample timestamps. I added a few simple scripts to aggregate data together into single csvs for easier processing. I looked into continuous wavelet decomposition with online matching pursuit at Doug’s suggestion and implemented a simple version in Python, but haven’t had time to try converting it into a proper NuPIC encoder.
From Sergey:
I introduced alternate CSV concatenation, it allows to work with tar file. Going to take close pyeeg lib to overview how we can feed data to it. Also having plans to bring FFT from nupic audio example.
From James:
I wrote an initial mat to csv conversion utility, made an initial graphing tool to examine the data (I’ll upload the newest copy today once I’ve finished cleaning it up, it makes a completely unnecessary 3d graph of all the electrodes plotted next to each other). The last week I’ve attempted various swarm and run_models on each of the columns in the csv (a random sampling of electrodes, Mean Absolute Difference and interquartile Difference. So far nupic hasn’t been accurate in it’s predictions for any of the columns. I wasn’t expecting much accuracy, but i wanted to copy the hotgym’s swarm/run over and have it ready to work on the matlab datasets for when we do have a better set of analytics to input into nupic. I’ve also played a bit with the fft examples (mainly making graphs and extracting the data with numpy). I’m hoping to have a swarm and model based on it by the end of the week. As my skillset is more programming oriented (I took a few college biology classes about 15 years ago), I’m mainly not trying to analyze the eeg data (much). I’m afraid that by the time I’m up to speed on current analysis techniques that area, this project will have been long done.
My code really needs to be cleaned up (thus my not checking it in yet), and I’m hoping to have that chore done by our next gathering. I’ll be adding a nupic dir (with the swarm.py, associated swarrm files, and run.py with a “decent” model params file that should be easily modifiable for whatever needs to be graphed), and cleaning up the dataExaminer.py and probably something for fft analysis. If anyone is interested in any of the above, take a look at my github:
https://github.com/moorejpdx/nupic.kaggle-eeg
From Doug
I have researched EEG feature extraction literature and examples to determine methods / algorithms that would be a good fit for NuPic encoding. Simple FFT would be a good starting point because we have some code already, but FFT is problematic as we would have to choose our frequencies of interest. Wavelet compression with Matching Pursuit would be better as it identifies the frequencies that contribute most to a signal. Team members have found several libraries that do this and could be integrated into our tool chain. Another feature extraction method was found that has potential but needs a closer look - the Fractal Dimension value. The pyeeg library has a couple of different implementations of Fractal Dimension as well as several other feature extraction methods popular in the EEG literature. References to some of these papers have been posted.