Computer Vision Journal Club

The Computer Vision Journal Club meets periodically to discuss papers on topics in computer vision, machine learning and other topics of interest such as assistive technologies for persons who are blind or visually impaired, dual sensory loss (hearing and vision loss), neuroscience and psychophysics. All are welcome to attend.


Fri Apr 21, 2017
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99). Link to pdf:

Fri Apr 7, 2017
Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. NIPS 2014 (pp. 2204-2212). Link to pdf:


Thurs Sept 29, 2016
Henry Lin and Max Tegmark. “Why does deep and cheap learning work so well?”

Thurs Apr 21, 2016
Fanello, S.R., Keskin, C., Izadi, S., Kohli, P., Kim, D., Sweeney, D., Criminisi, A., Shotton, J., Kang, S.B. and Paek, T., 2014. Learning to be a depth camera for close-range human capture and interaction. ACM Transactions on Graphics (TOG), 33(4), p.86. Link to pdf:

Thurs Apr 7, 2016
Engel, J., Sturm, J., & Cremers, D. (2013). Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1449-1456). Link to pdf:

Thurs Feb 18, 2016
O'Modhrain, S., Giudice, N. A., Gardner, J. A., & Legge, G. E. (2015). Designing media for visually-impaired users of refreshable touch displays: Possibilities and pitfalls. Haptics, IEEE Transactions on, 8(3), 248-257.

Thurs Feb 4, 2016
Michael A. Nielsen, "Neural Networks and Deep Learning." Determination Press. 2015. Chapters 4 ( and 5 (


Thurs Oct 8, 2015
Yann LeCun, Yoshua Bengio & Geoffrey Hinton. "Deep learning." Nature 521, 436–444 (28 May 2015).

Thurs Aug 27, 2015
Lin, T. Y., Cui, Y., Belongie, S., Hays, J., & Tech, C. (2015). Learning Deep Representations for Ground-to-Aerial Geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5007-5015). Link to pdf:

Fri May 19, 2015
Tye-Murray N, Spehar B, Myerson J, Sommers MS, Hale S. Crossmodal enhancement of speech detection in young and older adults: Does signal content matter? 2011;32(5):650-655. doi:10.1097/AUD.0b013e31821a4578. Link to pdf version:

Fri. May 15, 2015
Lederman, S. J., & Klatzky, R. L. (1987). "Hand movements: A window into haptic object recognition." Cognitive psychology, 19(3), 342-368. link

Fri. Apr. 24, 2015
Michael Nielsen. (2014). Neural Networks and Deep Learning. Chapter 3. link

Wed. Apr. 15, 2015
Ham, C., Lucey, S., & Singh, S. (2014). "Hand Waving Away Scale." ECCV 2014 (pp. 279-293). pdf

Fri. Mar. 27, 2015
Chandrika Jayant, Matt Renzelmann, Dana Wen, Satria Krisnandi, Richard Ladner, Dan Comden. "Automated Tactile Graphics Translation: In the Field." ASSETS 2007. link

Fri. Mar. 13, 2015
M. Uricar, V. Franc and V. Hlavac, Detector of Facial Landmarks Learned by the Structured Output SVM, VISAPP '12: Proceedings of the 7th International Conference on Computer Vision Theory and Applications, 2012. link pdf

Fri. Feb. 13, 2015
Michael Nielsen. (2014). Neural Networks and Deep Learning. Chapter 2. link

Fri. Jan. 23, 2015
Michael Nielsen. (2014). Neural Networks and Deep Learning. Chapter 1. link


Fri. Dec. 19, 2014
Guida, C., Comanducci, D., & Colombo, C. (2011). Automatic bus line number localization and recognition on mobile phones—a computer vision aid for the visually impaired. In Image Analysis and Processing–ICIAP 2011 (pp. 323-332). Springer Berlin Heidelberg. pdf

Fri. Nov. 14, 2014
F. Hu, Z. Zhu, and J. Zhang, "Mobile Panoramic Vision for Assisting the Blind via Indexing and Localization," Second Workshop on Assistive Computer Vision and Robotics, in conjunction with ECCV2014, Zurich, Switzerland, Sept 12, 2014. pdf

Fri. Oct. 24, 2014
A. Davis, M. Rubinstein, N. Wadhwa, G. Mysore, F. Durand, and W. T. Freeman, “The Visual Microphone: Passive Recovery of Sound from Video,” ACM Transactions on Graphics (Proc. SIGGRAPH), vol. 33, no. 4, pp. 79:1–79:10, 2014. link

Fri. Sept. 12, 2014
Kidron, E., Schechner, Y. Y., & Elad, M. (2005, June). Pixels that sound. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 88-95). IEEE. pdf

Fri. Aug. 15, 2014
Fu-Chung Huang, Gordon Wetzstein, Brian A. Barsky, and Ramesh Raskar. "Eyeglasses-free Display: Towards Correcting Visual Aberrations with Computational Light Field Displays". ACM Transaction on Graphics, xx:0, Aug. 2014. link

Fri. Aug. 1, 2014
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. pdf

Fri. May 16, 2014
Fanello, Sean Ryan, Ilaria Gori, Giorgio Metta, and Francesca Odone. "Keep it simple and sparse: real-time action recognition." The Journal of Machine Learning Research 14, no. 1 (2013): 2617-2640. pdf

Fri. Apr. 25, 2014
Oikonomidis, Iason, Nikolaos Kyriazis, and Antonis A. Argyros. "Efficient model-based 3D tracking of hand articulations using Kinect." In BMVC, pp. 1-11. 2011. pdf, link to code

Fri. Apr. 11, 2014
Bo, Liefeng, Xiaofeng Ren, and Dieter Fox. "Unsupervised feature learning for RGB-D based object recognition." Experimental Robotics. Springer International Publishing, 2013. pdf

Thurs. Mar. 27, 2014
Chen, Grossman, Wigdor, Fitzmaurice (2014). "Duet: Exploring Joint Interactions on a Smart Phone and a Smart Watch." CHI 2014.

Fri. Mar. 14, 2014
Terven, Juan, J. Salas, and B. Raducanu. "Computer Vision Systems for Visually Impaired People." (2013): 1-1. pdf

Fri. Feb. 21, 2014
Dollár, P., & Zitnick, C. L. (2013). Structured Forests for Fast Edge Detection. ICCV 2013. pdf

Fri. Jan. 24, 2014
Goodfellow, Ian J., et al. "Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks." arXiv preprint arXiv:1312.6082 (2013). pdf

Fri. Jan. 10, 2014
Li, B.Y.L.; Mian, A.S.; Wanquan Liu; Krishna, A., "Using Kinect for face recognition under varying poses, expressions, illumination and disguise," Applications of Computer Vision (WACV), 2013 IEEE Workshop on , vol., no., pp.186,192, 15-17 Jan. 2013. pdf


Fri. Dec. 13, 2013
A. Bansal, A. Kowdle, D. Parikh, A. C. Gallagher and C. L. Zitnick. "Which Edges Matter?" Workshop on 3D Representation and Recognition (3dRR) ICCV 2013. pdf

Fri. Nov. 8, 2013
Kane, S.K., Morris, M.R. and Wobbrock, J.O. (2013). Touchplates: Low-cost tactile overlays for visually impaired touch screen users. Proceedings of the ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '13). Bellevue, Washington (October 21-23, 2013). pdf

Fri. Nov. 1, 2013
Olshausen, Bruno A. "20 years of learning about vision: Questions answered, questions unanswered, and questions not yet asked." In proc. of "20 Years of Computational Neuroscience." J.M. Bower, Ed. (Symposium of CNS2010 annual meeting.) pdf

Fri. Oct. 25, 2013
Carl Vondrick Aditya Khosla Tomasz Malisiewicz Antonio Torralba. "HOGgles: Visualizing Object Detection Features." ICCV 2013. link

Fri. Oct. 18, 2013
Haselhoff, Anselm, and Anton Kummert. "On visual crosswalk detection for driver assistance systems." Intelligent Vehicles Symposium (IV), 2010 IEEE. IEEE, 2010.

Fri. Sept. 20, 2013
Simon Alexanderson, Jonas Beskow, Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions. Computer Speech & Language, Available online 5 March 2013, ISSN 0885-2308. pdf

Fri. Sept. 13, 2013
Fusco, G., Zini, L., Noceti, N., & Odone, F. (2013). Structured Multi-class Feature Selection for Effective Face Recognition. In Image Analysis and Processing–ICIAP 2013 (pp. 410-419). Springer Berlin Heidelberg. pdf

Fri. Sept. 6, 2013
Vázquez, M. & Steinfeld, A. (2012). Helping visually impaired users properly aim a camera. International ACM SIGACCESS Conference on Computers and Accessibility.

Thurs. Aug. 22, 2013
Changil Kim, Henning Zimmer, Yael Pritch, Alexander Sorkine-Hornung, Markus Gross. "Scene Reconstruction from High Spatio-Angular Resolution Light Fields." ACM Transactions on Graphics 32(4) (Proceedings of SIGGRAPH 2013). link

Fri. Aug. 2, 2013
Yui Man Lui. "Human Gesture Recognition on Product Manifolds." Journal of Machine Learning Research 13 (2012) 3297-3321. pdf

Fri. July 26, 2013
Shai Shalev-Shwartz, Yonatan Wexler, Amnon Shashua. "ShareBoost: Efficient multiclass learning with feature sharing." NIPS 2011. link

Fri. July 12, 2013
Hamed Pirsiavash, Deva Ramanan. "Detecting Activities of Daily Living in First-person Camera Views." CVPR 12. pdf

Fri. Jun. 21, 2013
Hara, Kotaro, Victoria Le, and Jon E. Froehlich. "Combining Crowdsourcing and Google Street View to Identify Street-level Accessibility Problems." CHI 2013. pdf

Fri. Jun. 7, 2013
Brilhault, A., Kammoun, S., Gutierrez, O., Truillet, P., & Jouffrais, C. (2011, February). Fusion of artificial vision and GPS to improve blind pedestrian positioning. In New Technologies, Mobility and Security (NTMS), 2011 4th IFIP International Conference on (pp. 1-5). IEEE. pdf

Fri. May 31, 2013
Wu, H. Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., & Freeman, W. (2012). "Eulerian video magnification for revealing subtle changes in the world." ACM Transactions on Graphics (TOG), 31(4), 65. link

Fri. May 24, 2013
Poignant, J.; Besacier, L.; Quenot, G.; Thollard, F., "From Text Detection in Videos to Person Identification," /Multimedia and Expo (ICME), 2012 IEEE International Conference on/ , vol., no., pp.854,859, 9-13 July 2012 doi: 10.1109/ICME.2012.119. pdf

Fri. Apr. 19, 2013
Azenkot, S., Rector, K., Ladner, R.E. and Wobbrock, J.O. (2012). PassChords: Secure multi-touch authentication for blind people. Proceedings of the ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). Boulder, Colorado (October 22-24, 2012). New York: ACM Press, pp. 159-166. Best Paper Winner. pdf

Fri. Apr. 12, 2013
Hwang, J., Ji, Y., & Kim, E. Y. (2012). Intelligent situation awareness on the EYECANE. In PRICAI 2012: Trends in Artificial Intelligence (pp. 740-745). Springer Berlin Heidelberg. link (behind paywall)

Fri. Apr. 5, 2013
Sebsadji, Y.; Tarel, J. -P; Foucher, P.; Charbonnier, P., "Robust road marking extraction in urban environments using stereo images," Intelligent Vehicles Symposium (IV), 2010 IEEE , vol., no., pp.394,400, 21-24 June 2010. pdf

Thurs. Mar. 28, 2013
Pedro Domingos. "A Few Useful Things to Know about Machine Learning." Communications of the ACM 55.10 (2012): 78-87. (Brief summary: This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.) pdf

Wed. Mar. 20, 2013
Two short papers:
Rusu, Radu Bogdan, and Steve Cousins. "3d is here: Point cloud library (pcl)." Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011. pdf
Steder, Bastian, et al. "NARF: 3D range image features for object recognition." Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS). Vol. 44. 2010. pdf

Fri. Mar. 8, 2013
Kalal, Z.; Mikolajczyk, K.; Matas, J. "Tracking-Learning-Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, no.7, pp.1409-1422, July 2012. pdf link

Fri. Feb. 22, 2013
Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel C. White and Jeffrey P. Bigham. "Visual Challenges in the Everyday Lives of Blind People." In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2013). Paris, France. To Appear. pdf

Fri. Feb. 15, 2013
S. Wang, C. Yi, and Y. Tian, “Signage Detection and Recognition for Blind Persons to Access Unfamiliar Environments,” Journal of Computer Vision and Image Processing, Vol. 2, No. 2, 2012. pdf

Fri. Feb. 8, 2013
Ramalingam, S.; Bouaziz, S.; Sturm, P.; Brand, M., “SKYLINE2GPS: Localization in Urban Canyons Using Omni-skylines”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), ISSN 2153-0858, pp. 3816-3823, October 2010. pdf

Thurs. Jan. 31, 2013
Antonio Torralba and William T. Freeman. "Accidental pinhole and pinspeck cameras: revealing the scene outside the picture." CVPR 2012. link

Fri. Jan. 25, 2013
Pablo F. Alcantarilla, Adrien Bartoli and Andrew J. Davison. "KAZE Features." (Brief summary: KAZE Features is a novel 2D feature detection and description method that operates completely in a nonlinear scale space.) ECCV 2012. link

Fri. Jan. 11, 2013
Adam O’Donovan and Ramani Duraiswami. "Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing." CVPR 2007. pdf


Fri. Dec. 14, 2012
Jeffrey Heer and Maureen Stone. "Color Naming Models for Color Selection, Image Editing and Palette Design." Computer-Human Interaction (CHI '12). pdf

Fri. Dec. 7, 2012
Ranzato, M., et al. "On deep generative models with applications to recognition." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. pdf

Fri. Nov. 16, 2012
Radvanyi, Mihály, Balázs Varga, and Kristóf Karacs. "Advanced crosswalk detection for the Bionic Eyeglass." Cellular Nanoscale Networks and Their Applications (CNNA), 2010 12th International Workshop on. IEEE, 2010. and Radványi, Mihály, and Kristóf Karacs. "Navigation through crosswalks with the Bionic Eyeglass." Applied Sciences in Biomedical and Communication Technologies (ISABEL), 2010 3rd International Symposium on. IEEE, 2010.

Fri. Nov. 9, 2012
A. Abrams, C. Hawley, and R. Pless. Heliometric Stereo: Shape From Sun Position. In Proc. European Conference on Computer Vision, October 2012. link

Fri. Oct. 26, 2012
Nikhil Naikal, Allen Y. Yang, and S. Shankar Sastry. "Informative Feature Selection for Object Recognition via Sparse PCA." ICCV 2011. pdf

Wed. Oct. 17, 2012
Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta and Alexei A. Efros. "Data-driven Visual Similarity for Cross-domain Image Matching." SIGGRAPH Asia, 2011. link

Fri. Oct. 5, 2012
Wilson S. Geisler, Jiri Najemnik and Almon D. Ing. "Optimal stimulus encoders for natural tasks." Journal of Vision. December 16, 2009. Vol. 9, no. 13, article 17. link

Thurs. Sept. 27, 2012
Jose M. Alvarez, Theo Gevers, Yann LeCun and Antonio M. Lopez. "Road Scene Segmentation from a Single Image." ECCV 2012. pdf

Thurs. Sept. 20, 2012
Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A. Efros, and Antonio Torralba. "Undoing the Damage of Dataset Bias." ECCV 2012. pdf

Thurs. Sept. 13, 2012
Yi Wu, Bin Shen, Haibin Ling. "Online Robust Image Alignment via Iterative Convex Optimization." CVPR 2012. pdf

Fri. Sept. 7, 2012
Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, Andrew Zisserman. "Discriminative Learned Dictionaries for Local Image Analysis." CVPR 2008. pdf

Thurs. Aug. 2, 2012
Hsueh-Cheng Wang and Marc Pomplun. "The attraction of visual attention to texts in real-world scenes." Journal of Vision. (2012) 12(6):26, 1–17. pdf

Thurs. July 26, 2012
Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean and Andrew Y. Ng. "Building High-Level Features using Large Scale Unsupervised Learning." In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012. pdf

Thurs. July 19, 2012
Felix von Hundelshausen and Rahul Sukthankar. "D-Nets: Beyond Patch-Based Image Descriptors." CVPR 2012. pdf

Fri. Jun. 15, 2012
Kang, S.B., Uyttendaele, M., Winder, S. & Szeliski, R. High dynamic range video. ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):319-325, July 2003. pdf

Wed. Jun. 6, 2012
Deng, Berg, Li & Fei Fei. "What does categorizing 10,000 Images Tell us?" ECCV 2010. pdf

Fri. Jun. 1, 2012
A. Shahab, F. Shafait, and A. Dengel. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. ICDAR 2011. pdf

Fri. May 25, 2012
Lukas Neumann and Jiri Matas. "Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search." ICDAR 2011. pdf

Fri. May 18, 2012
Lukas Neumann and Jiri Matas. "Real-Time Scene Text Localization and Recognition." CVPR '12. pdf

Fri. May 11, 2012
Nobuo Ezaki, Marius Bulacu and Lambert Schomaker, "Text Detection from Natural Scene Images: Towards a System for Visually Impaired Persons." ICPR'04. pdf

Fri. Apr. 27, 2012
Clemens Arth, Manfred Klopschitz, Gerhard Reitmayr and Dieter Schmalstieg. "Real-Time Self-Localization from Panoramic Images on Mobile Devices." IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technolgy Proceedings. 26 -29 October, Basel, Switzerland. pdf

Fri. Apr. 20, 2012
Carlos Merino-Gracia, Karel Lency and Majid Mirmehdi. "A Head-mounted Device for Recognizing Text in Natural Scenes." Fourth International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2011). September 2011. pdf

Wed. Apr. 11, 2012
Gregory K. Myers and Brian Burns. "A Robust Method for Tracking Scene Text in Video Imagery." First International Workshop on Camera-Based Document Analysis and Recognition. Seoul, Korea. August 2005. pdf

Fri. Apr. 6, 2012
Yi-Feng Pan, Xinwen Hou, Cheng-Lin Liu. "A hybrid approach to detect and localize texts in natural scene images." IEEE Transactions on Image Processing Volume: 20, Issue: 3, Pages: 800-813, 2011. pdf

Fri. Mar. 30, 2012
Lukas Neumann and Jiri Matas. "A method for text localization and recognition in real-world images." ACCV 2010. pdf

Fri. Mar. 16, 2012
Rodrigo Minetto, Nicolas Thome, Matthieu Cord, Neucimar Leite, Jorge Stolfi. "SnooperTrack: Text Detection and Tracking for Outdoor Videos." ICIP 2011. pdf

Fri. Mar. 9, 2012
Y. Avrithis, K. Rapantzikos. The Medial Feature Detector: Stable Regions from Image Boundaries. In Proceedings of International Conference on Computer Vision (ICCV 2011), Barcelona, Spain, November 2011. Project page for this paper (including the download, at the bottom of the page)

Thurs. Feb. 2, 2012
Kwang In Kim; Keechul Jung; Jin Hyung Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm", In IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 12, DECEMBER 2003. pdf

Tues. Jan. 17, 2012
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "Digits in Natural Images with Unsupervised Feature Learning." NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. pdf

Fri. Jan. 6, 2012
J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions." Proc. of British Machine Vision Conference, pages 384-396, 2002. pdf


Fri. Dec. 16, 2011
Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman. "Discovering object categories in image collections." MIT-CSAIL-TR-2005-012. pdf

Fri. Dec. 9, 2011
Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. In ICCV, 2011. Project page for this paper (including the download, at the bottom of the page)

Fri. Dec. 2, 2011
Quoc V. Le, Will Y. Zou, Serena Y. Yeung, Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011. pdf

Thurs. Nov. 17, 2011
Culotta, Kristjansson, McCallum & Viola. "Corrective Feedback and Persistent Learning for Information Extraction." Journal Artificial Intelligence. Volume 170 Issue 14, October 2006. pdf

Thurs. Nov. 10, 2011
Carl Vondrick and Deva Ramanan, "Video Annotation and Tracking with Active Learning." Neural Information Processing Systems (NIPS) Granada, Spain, December 2011.
pdf (paper), pdf (slides)

Thurs. Nov. 3, 2011
Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk and Bernd Girod. "Robust text detection in natural images with edge-enhanced maximally stable extremal regions." ICIP 2011. pdf

Thurs. Oct. 27, 2011
Jung-Jin Lee, Pyoung-Hean Lee, Seong-Whan Lee, Alan Yuille and Christof Koch. "AdaBoost for Text Detection in Natural Scene." ICDAR 2011. pdf

Thurs. Oct. 20, 2011
Simon Hawe, Martin Kleinsteuber, and Klaus Diepold. "Dense Disparity Maps from Sparse Disparity Measurements." ICCV 2011. pdf

Fri. Oct. 14, 2011
Monaci, G. ; Jost, P. ; Vandergheynst, P. ; Mailhe, B. ; Lesage, S. ; Gribonval, R. "Learning Multi-Modal Dictionaries." IEEE Transactions on Image Processing, vol. 16, num. 9, p. 2272-2283. 2007. pdf

Fri. Sept. 30, 2011
Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, Andrew Y. Ng. "Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning." ICDAR 2011. pdf

Fri. Sept. 23, 2011
A. Torralba, R. Fergus, W. T. Freeman. "80 million tiny images: a large dataset for non-parametric object and scene recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008. pdf

Wed. Sept. 14, 2011
Sebastian Nowozin, Carsten Rother, Shai Bagon, Toby Sharp, Bangpeng Yao, and Pushmeet Kohli. "Decision Tree Fields." ICCV 2011. pdf

Tues. Sept. 6, 2011
V. N. Murali and S. T. Birchfield. "Autonomous Navigation and Mapping Using Monocular Low-Resolution Grayscale Vision." IEEE Computer Society Workshop on Visual Localization for Mobile Platforms (in association with CVPR) 2008. pdf

Fri. Aug. 26, 2011
Anna Bosch, Andrew Zisserman, Xavier Munoz. "Image Classification using Random Forests and Ferns." ICCV 2007. pdf

Fri. Aug. 19, 2011
Mustafa Özuysal Pascal Fua Vincent Lepetit. "Fast Keypoint Recognition in Ten Lines of Code." CVPR 2007. pdf

Fri. Aug. 5, 2011
Kai Wang, Boris Babenko, and Serge Belongie. "End-to-end Scene Text Recognition." ICCV 2011, Barcelona, Spain. pdf

Thurs. Jul. 14, 2011
Pinto N, Barhomi Y, Cox DD, DiCarlo JJ. "Comparing State-of-the-Art Visual Features on Invariant Object Recognition Tasks." IEEE Workshop on Applications of Computer Vision (WACV 2011). pdf

Fri. Jun. 17, 2011
Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B. "An Introduction to Kernel-based Learning Algorithms." IEEE Transactions on Neural Networks. Vol. 12, No. 2. Mar. 2001. pdf

Fri. Jun. 10, 2011
Tommi S. Jaakkola and David Haussler. "Exploiting generative models in discriminative classifiers." NIPS 1998. pdf

Fri. May 20, 2011
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, Andrew Y. Ng. "Self-taught Learning: Transfer Learning from Unlabeled Data." Proceedings of the 24th International Conference on Machine Learning. Corvallis, OR, 2007. pdf

Thurs. May 12, 2011
Shenghua Gao, Ivor Wai-Hung Tsang, Liang-Tien Chia, Peilin Zhao. "Local Features Are Not Lonely – Laplacian Sparse Coding for Image Classification." CVPR 2010. pdf

Fri. May 6, 2011
Matthieu Guillaumin, Jakob Verbeek and Cordelia Schmid. "Multimodal semi-supervised learning for image classification." CVPR 2010. pdf

Fri. Apr. 22, 2011
Xiaojin Zhu. "Semi-Supervised Learning Literature Survey." Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005. pdf

Fri. Apr. 8, 2011
David Mackay. Chapters 42.1-42.5 (Hopfield networks and associative memory) and 43 (Boltzmann Machines) from Information Theory, Inference, and Learning Algorithms. 2003. pdf

Mon. Mar. 28, 2011
Geoffrey E. Hinton, Simon Osindero and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural Computation, Vol. 18, 1527-1554. 2006. pdf

Thurs. Mar. 10 and Fri. Mar. 18, 2011
Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng. "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations." ICML 2009. pdf

Thurs. Mar. 3, 2011
T. E. de Campos, B. R. Babu and M. Varma. "Character recognition in natural images." In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009. pdf

Fri. Feb. 4, 2011
Trevor Darrell, John W. Fisher III, Paul Viola, William Freeman. "Audio-visual Segmentation and 'The Cocktail Party Effect'." International Conference on Multimodal Interfaces. 2000. pdf

Tues. Jan. 25, 2011
Sven Olufs and Markus Vincze. "Room-Structure estimation in Manhattan-like Environments from dense 2.5D range data using minumum Entropy and Histograms." WACV 2011.

Tues. Jan. 18, 2011
Jerod Weinman, Erik Learned-Miller and Allen Hanson. "Scene text recognition using similarity and a lexicon with sparse belief propagation." IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Special Issue on Probabilistic Graphical Models, Vol. 31 No. 10, pp. 1733-1746, 2009. pdf


Mon. Dec. 20, 2010
Victor Fragoso, Steffen Gauglitz, Jim Kleban and Shane Zamora. "TranslatAR: A Mobile Augmented Reality Translator on the Nokia N900." WACV 2011. pdf

Fri. Dec. 10, 2010
T.F. Cootes and C.J. Taylor. "Statistical models of appearance for medical image analysis and computer vision." Proc. SPIE Medical Imaging, 2001. pdf

Fri. Dec. 3, 2010
Iryna Gordon and David G. Lowe, "What and where: 3D object recognition with accurate pose," in Toward Category-Level Object Recognition, eds. J. Ponce, M. Hebert, C. Schmid, and A. Zisserman, (Springer-Verlag, 2006), pp. 67-82. pdf

Fri. Nov. 19, 2010
S. Gu and Y. Zheng and C. Tomasi. "Efficient Visual Object Tracking with Online Nearest Neighbor Classifier." Asian Conference on Computer Vision. December 2010. pdf

Fri. Oct. 29, 2010
Emmanuel J. Candès and Michael B. Wakin. "An Introduction To Compressive Sampling." IEEE Signal Processing Magazine. Mar. 2008. pdf

Fri. Oct. 22, 2010
M. J. Wainwright, T. S. Jaakkola and A. S. Willsky. "MAP estimation via agreement on trees: Message-passing and linear programming." IEEE Transactions on Information Theory, November 2005. pdf

Fri. Oct. 15, 2010
Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. "Localizing Objects While Learning Their Appearance." ECCV 2010. pdf

Fri. Oct. 8, 2010
Regis Behmo, Paul Marcombes, Arnak Dalalyan and Veronique Prinet. "Towards Optimal Naive Bayes Nearest Neighbor." ECCV 2010.

Fri. Oct. 1, 2010
Oren Boiman, Eli Shechtman and Michal Irani. "In defense of Nearest-Neighbor based image classification." CVPR 2008. pdf

Fri. Sept. 17, 2010
Passages from Richard Szeliski's book, "Computer Vision: Algorithms and Applications." Download entire book here.
Topics discussed: Chapter 7, sections 7.1, 7.2, 7.4.1 and 7.4.2 (structure from motion).

Thurs. Aug. 26, 2010
Juergen Gall and Victor Lempitsky. "Class-specific hough forests for object detection." CVPR 2009.
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images.
Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection
hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However,
whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the
object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets.

Fri. Aug. 20, 2010
Olga Barinova, Victor Lempitsky and Pushmeet Kohli. "On Detection of Multiple Object Instances using Hough Transforms." CVPR 2010.
To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode
seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a
new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability.
At the same time, the framework bypasses the problem of multiple peaks identification in Hough images, and
permits detection of multiple objects without invoking nonmaximum suppression heuristics. As a result, the experiments
demonstrate a significant improvement in detection accuracy both for the classical task of straight line detection
and for a more modern category-level (pedestrian) detection problem.

Fri. Aug. 13, 2010
Passages from Richard Szeliski's book, "Computer Vision: Algorithms and Applications." Download entire book here.
Topics discussed:
Feature Based alignment, chapter 6 (first two sections: 2D alignment and Panography): pp 313-317.
Image stitching, chapter 9, sections: 9.1.6 (Cylindrical and Spherical coordinates) pp. 440-443; 9.3.4. (Blending) pp. 461-464.

Fri. Aug. 6, 2010
Kai Wang and Serge Belongie. ''Word Spotting in the Wild.'' ECCV 2010.
We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of diculty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer
vision to date. At the other extreme lie visual CAPTCHAs -- text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of diculties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves.We compare performance of leading OCR engines { one open source and one proprietary { with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

Fri. July 30, 2010
David Gallup, Jan-Michael Frahm, Marc Pollefeys. ''Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction.'' CVPR 2010.
Piecewise planar models for stereo have recently become popular for modeling indoor and urban outdoor scenes. The strong planarity assumption overcomes the challenges presented by poorly textured surfaces, and results in low complexity 3D models for rendering, storage, and transmission. However, such a model performs poorly in the presence of non-planar objects, for example, bushes, trees, and other clutter present in many scenes. We present a stereo method capable of handling more general scenes containing both planar and non-planar regions. Our proposed technique segments an image into piecewise planar regions as well as regions labeled as non-planar. The nonplanar regions are modeled by the results of a standard multi-view stereo algorithm. The segmentation is driven by multi-view photoconsistency as well as the result of a color and texture-based classifier, learned from hand-labeled planar and non-planar image regions. Additionally our method links and fuses plane hypotheses across multiple overlapping views, ensuring a consistent 3D reconstruction over an arbitrary number of images. Using our system, we have reconstructed thousands of frames of street-level video. Results show our method successfully recovers piecewise planar surfaces alongside general 3D surfaces in challenging scenes containing large buildings as well as residential houses.

Wed. July 7, 2010
Uwe Schmidt, Qi Gao and Stefan Roth. "A Generative Perspective on MRFs in Low-Level Vision." CVPR 2010.
Markov random fields (MRFs) are popular and generic probabilistic models of prior knowledge in low-level vision. Yet their generative properties are rarely examined, while application-specific models and non-probabilistic learning are gaining increased attention. In this paper we revisit the generative aspects of MRFs, and analyze the quality of common image priors in a fully application-neutral setting. Enabled by a general class of MRFs with flexible potentialsand an efficient Gibbs sampler, we find that common modelsdo not capture the statistics of natural images well. We show how to remedy this by exploiting the efficient sampler for learning better generative MRFs based on flexible potentials. We perform image restoration with these models by computing the Bayesian minimum mean squared error estimate (MMSE) using sampling. This addresses a number of shortcomings that have limited generative MRFs so far, and leads to substantially improved performance over maximum a-posteriori (MAP) estimation. We demonstrate that combining our learned generative models with sampling based MMSE estimation yields excellent application results that can compete with recent discriminative methods.

Fri. Jun. 25, 2010
Boris Epshtein, Eyal Ofek, Yonatan Wexler. "Detecting Text in Natural Scenes with Stroke Width Transform." CVPR 2010.
We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

April 2010
Gregory Rogez, Jonathan Rihan, Srikumar Ramalingam, Carlos Orrite and Philip H.S. Torr. "Randomized Trees for Human Pose Detection." CVPR 2008.
This paper addresses human pose recognition from video sequences by formulating it as a classification problem. Unlike much previous work we do not make any assumptions on the availability of clean segmentation. The first step of this work consists in a novel method of aligning the training images using 3D Mocap data. Next we define classes by discretizing a 2D manifold whose two dimensions are camera viewpoint and actions. Our main contribution is a pose detection algorithm based on random forests. A bottomup approach is followed to build a decision tree by recursively clustering and merging the classes at each level. For each node of the decision tree we build a list of potentially discriminative features using the alignment of training images; in this paper we consider Histograms of Orientated Gradient (HOG). We finally grow an ensemble of trees by randomly sampling one of the selected HOG blocks at each node. Our proposed approach gives promising results with both fixed and moving cameras.


Dec. 2009
Christoph H. Lampert, Hannes Nickisch, Stefan Harmeling. "Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer." CVPR 2009.
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of
them image, collections have been formed and annotated with suitable class labels.
In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new largescale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson’s classic
table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.