I came across the COVID-19 Open Research Dataset Challenge (CORD-19) today. They have a few associated tasks on there designed to help the medical community develop answers to high priority scientific questions.
Curious if anyone here is working with this data set?
My initial thoughts on where folks in this community might be able to help is in the realm of NLP. The tasks are all about sifting through a large amount of articles to locate information that can be used to answer a list of questions. This looks like a good candidate for word SDRs created through semantic folding.
Word SDRs can be stacked and sparsified (Cortical IO has some videos on YouTube which explain how to do this), allowing one to create semantically relevant SDRs for sentences, paragraphs, papers, and/or collections of papers. SDRs can have the bits from other SDRs subtracted from them, to allow one to identify the main islands of semantic meaning.
Some places where this could be useful for CORD-19 are:
- A question can be broken down by a person into a set of keywords, which can then be encoded into an SDR to use for searching potentially relevant documents
- If one article is found which is of particular interest, one could easily search and return other documents in the data set which are most similar to it in semantic meaning.
- A set of similar documents can be encoded into an SDR, find the most similar word, subtract the SDR for that word, and repeat (identifying the islands of semantics that are common to the group of documents). This could be used to quickly identify correlations between documents that might otherwise require a lot of deeper reading and study of the documents to identify. This could then be used to feed back into the search keywords.
If anyone is interested in working with me on this strategy to try and answer some of the questions, let me know. I will put some code up on GitHub if we find that it is actually useful for these sorts of tasks.
I think I’ll have time to try some stuff this weekend. Do you have initial results by any chance?
Also, cortical.io’s main website seems down. How can I contact the maintainers?
It works now. Maybe just unstable. Yet we now don’t have the ability to get an API key.
Unfortunately it looks like they have removed the option for requesting a free API key since January of this year, and without one requests to the Retina API that they host are throttled to something like a couple per minute. This has made it difficult to determine if this is likely to work or not. I’ve sent them an email to see what is the new process for accessing the Retina API. Will post an update when I have more info.
I’ve dug deeper into their new tech. I’m not sure if they are still using SDR. SDRs are mentioned nowhere in their new material. Tho the Retna API is still available. Shall I ask CortialIO’s chatbot ?
Yes, their tech still uses SDRs (they call them “fingerprints”, which are encoded as “positions” of the 1-bits in the SDR). For example: http://api.cortical.io:80/rest/terms?retina_name=en_associative&term=hair&start_index=0&max_results=1&get_fingerprint=true
They reference these “fingerprints” in their product descriptions. For example, the page for semantic search (see the diagram):
@Paul_Lamb Did they reply by any chance?
Sadly no. Would be nice to get an answer (even if it is no), but I suppose the quote on their page is their current position:
For the time being, we are not supporting acces to our low level APIs. This allows us to focus all our efforts on our solutions - Contract Intelligence, Message Intelligence and enterprise-class semantic search use cases. If you are interested in any of these solutions, please contact us.
I’ve sent a follow-up email mentioning that their semantic search solution looks like a good fit for the CORD-19 tasks, and that I would like to explore this. Will see if that sparks any interest.