Deep Visual-Semantic Alignments for Generating Image Descriptions

Because of the Nov. 14th submission  deadline for this years IEEE Conference on Computer Vision and Pattern Recognition (CVPR) several big image-recognition papers are coming out this week: From Andrej Karpathy and Li Fei-Fei of Stanford: We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations... ( website with examples ) ( full paper ) From Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan at Google: Show and Tell: A Neural Image Caption Generator  ( announcement post ) ( full paper ) From Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel at University of Toronto: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models  ( full paper ) From Junhua Mao, Wei Xu, Yi Yang, Jiang Wang and Alan L. Yuille at Baidu Research/UCLA: Explain Images with Multimodal Recurrent Neural Networks  ( full paper ) From Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell at UT Austin, UMass Lowell and UC Berkeley: Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( full paper ) All these came from this Hacker News discussion .

Boston Magazine Profiles Rodney Brooks of Rethink

Long article about Rodney Brooks co-founder of Rethink and former CTO at iRobot: ...Brooks cofounded the bedford-based iRobot in 1990, and his motivation, he explains, had something to do with vanity: “My thoughts on my self-image at the time was that I didn’t really want to be remembered for building insects.” Then he pauses for a moment and laughs. “But after that I started building vacuum-cleaning robots. And now there is a research group using Baxter to open stool samples. So now it’s shit-handling robots. I think maybe I should have quit while I was ahead. You know, that’s something no one ever says: ‘I hope my kid grows up to open stool samples... ( full article )

Grabit Inc. Demos Electrostatic Gripper

From Grabit Inc.: Enhanced Flexibility Grabit technology eliminates the need for part-specific grippers and minimizes gripper changeover, dramatically reducing costs and downtime. Gentle Handling Grabit grippers offer scratch and smudge-free handling with its clean grasping and eliminates the need to remove residue left by vacuum cups. Grabit’s uniform grasping effect eliminates high “point stresses” on large format glass sheets. Low Energy & Quiet Operations Grabit products operate at ultra-low energy levels providing cost savings and enabling mobile robot applications, and also offer quiet operations improving factory conditions and supporting the adoption of collaborative robots... ( homepage )

At Japan Robot Week, Mechanical Barista Treats Visitors to Coffee

From  Japan Times :

iRobot Unveils Its First Multi-Robot Tablet Controller for First Responders, Defense Forces and Industrial Customers

From iRobot: The uPoint MRC system runs an Android-based app that standardizes the control of any robot within the iRobot family of unmanned vehicles. Utilizing the same intuitive touchscreen technology in use today on millions of digital devices, the uPoint MRC system simplifies robot operations including driving, manipulation and inspection, allowing operators to focus more on the mission at hand... ( full press release )

YuMi: ABB's Collaborative Robot

From ABB: YuMi is a human-friendly dual arm robot designed for a new era of automation, for example in small parts assembly, where people and robots work hand-in-hand on the same tasks. YuMi is short for ‘you and me,' working together. YuMi has been developed to meet the flexible and agile production needs of the consumer electronics industry in the first instance. It will increasingly be rolled out to cover other market sectors. YuMi is a collaborative, dual arm assembly solution with the ability to feel and see. The robot's soft, padded dual arms, combined with innovative force-sensing technology ensure the safety of YuMi's human co-workers. Safety is built into the functionality of the robot itself so that it can work cage-free... ( cont'd )

Beginners Tutorial: Autonomous Robot Control Theory

From Nick McCrea at Toptal: In this article, I’m going to describe the control scheme of my simulated robot, illustrate how it interacts with its environment and achieves its goals, and discuss some of the fundamental challenges of robotics that I encountered along the way... ( full article )

Unbounded Robotics to Shut Down Due to Issues With Willow Garage Spin-Off Agreement

From Evan Ackerman at IEEE Spectrum: IEEE Spectrum has learned that Unbounded Robotics, a spin-off of Willow Garage that developed the UBR-1 mobile manipulator, is currently "in the process of shutting down." The UBR-1 is no longer for sale. Recently, a source (who asked not to be named) forwarded us an email from Unbounded Robotics CEO Melonee Wise. Our source received the message after requesting a quote for a UBR-1 robot: "Unfortunately Unbounded Robotics is in the process of shutting down due to issues with our Willow Garage spin off agreement that prevents us from raising series A investment. Unbounded Robotics is no longer selling the UBR-1.... ( cont'd )

Large Scale Visual Recognition Challenge 2014

Introduction: This challenge evaluates algorithms for object detection and image classification at large scale. This year there will be two competitions: A  PASCAL-style  detection challenge on fully labeled data for 200 categories of objects, and An image classification plus object localization challenge with 1000 categories. NEW: This year all participants are encouraged to submit object localization results; in past challenges, submissions to classification and classification with localization tasks were accepted separately.One high level motivation is to allow researchers to compare progress in detection across a wider variety of objects -- taking advantage of the quite expensive labeling effort. Another motivation is to measure the progress of computer vision for large scale image indexing for retrieval and annotation... ( rules and results )  New York Times article: Started in 2010 by Stanford, Princeton and Columbia University scientists, the Large Scale Visual Recognition Challenge this year drew 38 entrants from 13 countries. The groups use advanced software, in most cases modeled loosely on the biological vision systems, to detect, locate and classify a huge set of images taken from Internet sources like Twitter. The contest was sponsored this year by Google, Stanford, Facebook and the University of North Carolina. Contestants run their recognition programs on high-performance computers based in many cases on specialized processors called G.P.U.s, for graphic processing units. This year there were six categories based on object detection, locating objects and classifying them... ( cont'd )

Nvidia Jetson TK1 Development Kit

Features ( $192 ): Tegra K1 SOC Kepler GPU with 192 CUDA cores 4-Plus-1 quad-core ARM Cortex A15 CPU 2 GB x16 memory with 64 bit width 16 GB 4.51 eMMC memory 1 Half mini-PCIE slot 1 Full size SD/MMC connector 1 Full-size HDMI port 1 USB 2.0 port, micro AB 1 USB 3.0 port, A 1 RS232 serial port 1 ALC5639 Realtek Audio codec with Mic in and Line out 1 RTL8111GS Realtek GigE LAN 1 SATA data port SPI 4MByte boot flash Dr. Dobbs has an in depth look here .

Robot With Broken Leg Learns To Walk Again In 2 Minutes

From Medium: Today, Antoine Cully at the Sorbonne University in Paris and a couple of pals say they’ve developed a technique that allows a damaged robot to learn how to walk again in just a few seconds. They say their work has important consequences for the reliability and robustness of future robots and may also provide some insight into the way that animals adapt to injury as well... ( cont'd )

Dissembling a 1980s Kuka Robot

Relaxing summertime viewing of a 80s era Kuka robot being taken apart and dissected ( 20 parts total ).

Tessel: A Microcontroller That Runs JavaScript

From tessel.io : Tessel is a microcontroller that runs JavaScript. It's Node-compatible and ships with Wifi built in. Use it to easily make physical devices that connect to the web. Programmable via JavaScript 14 different hardware modules for added capabilities Compatible with 10,000's of Node.js packages on NPM Deploy over USB or remotely by WiFi   180mhz ARM Cortex-M3 LPC1830 32mb SDRAM 32mb Flash TI CC3000 WiFi radio 20-pin GPIO bank for general prototyping Micro USB or battery power Starting at $99 (controller and one module)

Records 91 to 103 of 103

First | Previous

Factory Automation - Featured Product

Next Generation of Precision Motion Controllers - V3.0

Next Generation of Precision Motion Controllers - V3.0

MICROMO launches the new MC3/MCS motion control family. The new high performance, intelligent controllers are optimized for use with FAULHABER motors, offer electronics for simple operation with state-of-the-art interfaces for multi-axis applications, and provide a motion control system solution with the most compact integration into industrial grade housing.