From Yezhou Yang, Yi Li, Cornelia Fermuller and Yiannis Aloimonos:
In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation.
The list of the grasping types.
Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by “watching” unconstrained videos with high accuracy.... (article at Kurzweilai.net) (original paper)
Winter break homework from John Carmack. Gamasutra reprint article "In-depth: Functional programming in C++":
A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in. In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention. Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible... (full article)
Also "Lessons to learn from Oculus development team when using the “Modern C++” approach":
Modern C++ doesn’t imply necessarly the overuse of templates
Andrei Alexandrescu says about the Modern C++ design:
"Modern C++ Design defines and systematically uses generic components - highly flexible design artifacts that are mixable and matchable to obtain rich behaviors with a small, orthogonal body of code."
Modern C++ has a close relation with generic programming; probably it’s the reason that makes many developers neglect the modern C++ approach. They think that the code will be mostly implemented as templates, which makes the code difficult to read and maintain.
In the SDK, the templates represent only 20% of all types defined and most of them are related to the technical layer... (full article)
From the OpenCV Foundation:
OpenCV Foundation with support from DARPA and Intel Corporation are launching a community-wide challenge to update and extend the OpenCV library with state-of-art algorithms. An award pool of $50,000 is provided to reward submitters of the best performing algorithms in the following 11 CV application areas: (1) image segmentation, (2) image registration, (3) human pose estimation, (4) SLAM, (5) multi-view stereo matching, (6) object recognition, (7) face recognition, (8) gesture recognition, (9) action recognition, (10) text recognition, (11) tracking.
The OpenCV Vision Challenge Committee will judge up to five best entries.
You may submit a new algorithm developed by yourself or your implementation of an existing algorithm even if you are not the author of the algorithm.
You may enter any number of categories.
If your entry wins the contest you will be awarded $1K.
To win an additional $7.5 to $9K, you must contribute the source code as an OpenCV pull request under a BSD license.
You acknowledge that your contributed code may be included, with your copyright, in OpenCV.
You may explicitly enter code for any work you have submitted to CVPR 2015 or its workshops. We will not unveil it until after CVPR.
Submission Period: Now – May 8th 2015
Winners Announcement: June 8th 2015 at CVPR 2015
Because of the Nov. 14th submission deadline for this years IEEE Conference on Computer Vision and Pattern Recognition (CVPR) several big image-recognition papers are coming out this week:
From Andrej Karpathy and Li Fei-Fei of Stanford:
We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations... (website with examples) (full paper)
From Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan at Google:
From Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel at University of Toronto:
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models (full paper)
From Junhua Mao, Wei Xu, Yi Yang, Jiang Wang and Alan L. Yuille at Baidu Research/UCLA:
Explain Images with Multimodal Recurrent Neural Networks (full paper)
From Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell at UT Austin, UMass Lowell and UC Berkeley:
Long-term Recurrent Convolutional Networks for
Visual Recognition and Description (full paper)
All these came from this Hacker News discussion.
Long article about Rodney Brooks co-founder of Rethink and former CTO at iRobot:
...Brooks cofounded the bedford-based iRobot in 1990, and his motivation, he explains, had something to do with vanity: “My thoughts on my self-image at the time was that I didn’t really want to be remembered for building insects.” Then he pauses for a moment and laughs. “But after that I started building vacuum-cleaning robots. And now there is a research group using Baxter to open stool samples. So now it’s shit-handling robots. I think maybe I should have quit while I was ahead. You know, that’s something no one ever says: ‘I hope my kid grows up to open stool samples... (full article)
From Grabit Inc.:
Grabit technology eliminates the need for part-specific grippers and minimizes gripper changeover, dramatically reducing costs and downtime.
Grabit grippers offer scratch and smudge-free handling with its clean grasping and eliminates the need to remove residue left by vacuum cups. Grabit’s uniform grasping effect eliminates high “point stresses” on large format glass sheets.
Low Energy & Quiet Operations
Grabit products operate at ultra-low energy levels providing cost savings and enabling mobile robot applications, and also offer quiet operations improving factory conditions and supporting the adoption of collaborative robots... (homepage)
From Japan Times:
iRobot Unveils Its First Multi-Robot Tablet Controller for First Responders, Defense Forces and Industrial Customers
The uPoint MRC system runs an Android-based app that standardizes the control of any robot within the iRobot family of unmanned vehicles. Utilizing the same intuitive touchscreen technology in use today on millions of digital devices, the uPoint MRC system simplifies robot operations including driving, manipulation and inspection, allowing operators to focus more on the mission at hand... (full press release)
YuMi is a human-friendly dual arm robot designed for a new era of automation, for example in small parts assembly, where people and robots work hand-in-hand on the same tasks. YuMi is short for ‘you and me,' working together.
YuMi has been developed to meet the flexible and agile production needs of the consumer electronics industry in the first instance. It will increasingly be rolled out to cover other market sectors. YuMi is a collaborative, dual arm assembly solution with the ability to feel and see. The robot's soft, padded dual arms, combined with innovative force-sensing technology ensure the safety of YuMi's human co-workers. Safety is built into the functionality of the robot itself so that it can work cage-free... (cont'd)
From Nick McCrea at Toptal:
In this article, I’m going to describe the control scheme of my simulated robot, illustrate how it interacts with its environment and achieves its goals, and discuss some of the fundamental challenges of robotics that I encountered along the way... (full article)
From Evan Ackerman at IEEE Spectrum:
IEEE Spectrum has learned that Unbounded Robotics, a spin-off of Willow Garage that developed the UBR-1 mobile manipulator, is currently "in the process of shutting down." The UBR-1 is no longer for sale.
Recently, a source (who asked not to be named) forwarded us an email from Unbounded Robotics CEO Melonee Wise. Our source received the message after requesting a quote for a UBR-1 robot:
"Unfortunately Unbounded Robotics is in the process of shutting down due to issues with our Willow Garage spin off agreement that prevents us from raising series A investment. Unbounded Robotics is no longer selling the UBR-1.... (cont'd)
This challenge evaluates algorithms for object detection and image classification at large scale. This year there will be two competitions:
- A PASCAL-style detection challenge on fully labeled data for 200 categories of objects, and
- An image classification plus object localization challenge with 1000 categories.
NEW: This year all participants are encouraged to submit object localization results; in past challenges, submissions to classification and classification with localization tasks were accepted separately.One high level motivation is to allow researchers to compare progress in detection across a wider variety of objects -- taking advantage of the quite expensive labeling effort. Another motivation is to measure the progress of computer vision for large scale image indexing for retrieval and annotation... (rules and results)
New York Times article:
Started in 2010 by Stanford, Princeton and Columbia University scientists, the Large Scale Visual Recognition Challenge this year drew 38 entrants from 13 countries. The groups use advanced software, in most cases modeled loosely on the biological vision systems, to detect, locate and classify a huge set of images taken from Internet sources like Twitter. The contest was sponsored this year by Google, Stanford, Facebook and the University of North Carolina.
Contestants run their recognition programs on high-performance computers based in many cases on specialized processors called G.P.U.s, for graphic processing units.
This year there were six categories based on object detection, locating objects and classifying them... (cont'd)
Features ($192 ):
Tegra K1 SOC
- Kepler GPU with 192 CUDA cores
- 4-Plus-1 quad-core ARM Cortex A15 CPU
- 2 GB x16 memory with 64 bit width
- 16 GB 4.51 eMMC memory
- 1 Half mini-PCIE slot
- 1 Full size SD/MMC connector
- 1 Full-size HDMI port
- 1 USB 2.0 port, micro AB
- 1 USB 3.0 port, A
- 1 RS232 serial port
- 1 ALC5639 Realtek Audio codec with Mic in and Line out
- 1 RTL8111GS Realtek GigE LAN
- 1 SATA data port
- SPI 4MByte boot flash
Dr. Dobbs has an in depth look here.
Today, Antoine Cully at the Sorbonne University in Paris and a couple of pals say they’ve developed a technique that allows a damaged robot to learn how to walk again in just a few seconds. They say their work has important consequences for the reliability and robustness of future robots and may also provide some insight into the way that animals adapt to injury as well... (cont'd)
Relaxing summertime viewing of a 80s era Kuka robot being taken apart and dissected (20 parts total).
Records 46 to 60 of 61