One of the hallmarks of human intelligence is the ability to learn new tasks despite the paucity of direct supervision. Machine learning models have recently achieved impressive performance in this setting by using the following protocol: i) Collect a massive dataset, ii) Train a very large model and iii) Adapt to downstream tasks using very little, if any, task-specific labeled data. While this has been working remarkably well, it is still dissatisfying because the information present in each downstream task is never transformed into actual knowledge that can be leveraged to improve the prediction of subsequent downstream tasks. As a result, once in a while even larger models need to be retrained from scratch to account for the ever increasing amount of data.
This begs two basic questions. First, what learning settings are useful to study knowledge accrual? And second, what methods are effective and efficient at learning from never-ending streams of data? In this talk, I will present a preliminary investigation in our quest to answer these questions. I will present experiments using anytime and continual learning with metrics accounting for both error rate and efficiency of learning through time.
I will also discuss how modular architectures can strike good trade-offs in this setting. These networks, whose computation is expressed as the composition of basic modules, can naturally grow over time to account for new incoming data by simply adding new modules to the existing set of modules, and they can retain efficiency as the number of modules grow if only a small and constant number of modules is used at inference time. While these are admittedly baby steps towards our original goal, we hope to stimulate discussion and interest in our community about the fundamental question of how to represent and accrue knowledge over time.