The Embedded Blog: February 2020

Thursday, February 27, 2020

Research tool uses machine learning to predict how fast code will run

By Nick Flaherty www.flaherty.co.uk

Researchers at MIT's CSAIL lab in the US have developed a machine-learning tool that predicts how fast computer chips will execute code from various applications.

Compilers typically use performance models that run the code through a simulation of given chip architectures and use that for the code optimisation. Developers can then go in and work on the bottlenecks that slow down the operation.

However the performance models for machine code are handwritten by a relatively small group of experts and are not necessarily completely validated, which can be an issue. This means that the simulated performance measurements often deviate from real-life results.

The machine learning pipeline that automates this process, making it easier, faster, and more accurate. The Ithemal tool is a neural-network model that trains on labelled data in the form of “basic blocks” — fundamental snippets of computing instructions — to automatically predict how long it takes a given chip to execute previously unseen basic blocks. The results suggest this performs far more accurately than traditional hand-tuned models.

The researchers presented a benchmark suite of basic blocks from a variety of domains, including machine learning, compilers, cryptography, and graphics that can be used to validate performance models. They pooled more than 300,000 of the profiled blocks into an open-source dataset called BHive. During their evaluations, Ithemal predicted how fast Intel chips would run code even better than a performance model built by Intel itself using over 3,000 pages describing its chips’ architectures.

“Intel’s documents are neither error-free nor complete, and Intel will omit certain things, because it’s proprietary,” says co-author Charith Mendis, a PhD student at CSAIL. “However, when you use data, you don’t need to know the documentation. If there’s something hidden you can learn it directly from the data.”

In training, the Ithemal model analyzes millions of automatically profiled basic blocks to learn exactly how different chip architectures will execute computation. Importantly, Ithemal takes raw text as input and does not require manually adding features to the input data. In testing, Ithemal can be fed previously unseen basic blocks and a given chip, and will generate a single number indicating how fast the chip will execute that code.

To do so, the researchers clocked the average number of cycles a given microprocessor takes to compute basic block instructions — basically, the sequence of boot-up, execute, and shut down — without human intervention. Automating the process enables rapid profiling of hundreds of thousands or millions of blocks.

The researchers found Ithemal cut error rates in accuracy by 50 percent over traditional hand-crafted models, reducing to 10 percent, while the Intel performance-prediction model’s error rate was 20 percent on a variety of basic blocks across multiple different domains.

The tool should allow developers to generate code that runs faster and more efficiently on an ever-growing number of diverse and “black box” chip designs, says Mendis. For instance, domain-specific architectures, such as Google’s Tensor Processing Unit used specifically for neural networks, can be analysed. “If you want to train a model on some new architecture, you just collect more data from that architecture, run it through our profiler, use that information to train Ithemal, and now you have a model that predicts performance,” said Mendis.

“Modern computer processors are opaque, horrendously complicated, and difficult to understand. It is also incredibly challenging to write computer code that executes as fast as possible for these processors,” says co-author Michael Carbin, an assistant professor in the Department of Electrical Engineering and Computer Science (EECS). “This tool is a big step forward toward fully modeling the performance of these chips for improved efficiency.”

In a paper presented at the NeurIPS conference, the team proposed a new technique to automatically generate compiler optimizations. Specifically, they automatically generate an algorithm, called Vemal, that converts certain code into vectors, which can be used for parallel computing. Vemal outperforms hand-crafted vectorization algorithms used in the LLVM compiler.

Next, the researchers are studying methods to make models interpretable. Much of machine learning is a black box, so it’s not really clear why a particular model made its predictions. “Our model is saying it takes a processor, say, 10 cycles to execute a basic block. Now, we’re trying to figure out why,” said Carbin. “That’s a fine level of granularity that would be amazing for these types of tools.”

They also hope to use Ithemal to enhance the performance of Vemal even further and achieve better performance automatically.

www.csail.mit.edu

Wednesday, February 26, 2020

Qualcomm demos 3Gbit/s WiFI 6E at 6GHz

By Nick Flaherty www.flaherty.co.uk

Qualcomm Technologies has demonstrated the next generation of WiFi operating at 6GHz, above today's current bands.

Despite the attempt to simplify the marketing names of WiFi generations with the move to WiFi6 away from the a,b,g,ad names, this next generation is being called WiFi 6E.

The demo at 6GHz uses Qualcomm's FastConnect mobile connectivity subsystem and Networking Pro Series Wi-Fi Access Point platforms.

The new approach, which is still waiting for regulatory approval for the 6GHz band, supports numerous 160 MHz channels and advanced modulation techniques to boost the data rate to 3Gbit/s. Qualcomm likes it as this also gives the opportunity to add extra (non-standard) end-to-end Wi-Fi 6 features to differentiate its offerings.

“Building on our deep technology expertise and industry-proven feature superiority, Qualcomm Technologies is again poised to usher in a new era of Wi-Fi performance and capability with the addition of 6 GHz spectrum, or Wi-Fi 6E,” said Rahul Patel, senior vice president and general manager, connectivity and networking at Qualcomm Technologies. “Once the spectrum is allocated, Wi-Fi 6E is primed to solve for modern connectivity challenges and create new opportunities for the next generation of devices and experiences.”

Commercially available Wi-Fi 6 devices are now based on Qualcomm's Snapdragon 865 Mobile Platform. The latest FastConnect 6800 subsystem is capable of delivering a new class of Wi-Fi speed (approaching 1.8 Gbps) even in densely congested environments. This has support for uplink and downlink MU-MIMO (supporting up to 8 stream scenarios), OFDMA and 1024 QAM extended across 2.4 and 5GHz bands, and latency reducing optimizations.

The Networking Pro Series provides networking processing and deterministic resource allocation through multi-user algorithms for OFDMA and MU-MIMO, and has been used in 200 designs shipping or in development.

Qualcomm's Wi-Fi 6E page.

Related WiFi articles on the Embedded blog:

Monday, February 24, 2020

Researchers hack MobilEye camera chip with tape

By Nick Flaherty www.flaherty.co.uk

Researchers at McAfee Advanced Threat Research (ATR) have hacked the machine learning algorithms in a MobilEye camera chip used in Tesla cars.

The team looks at model hacking, the study of how hackers could target and evade artificial intelligence, with a focus on the broadly deployed MobilEye camera system. This is used in over 40 million vehicles, including Tesla models that implement Hardware Pack 1.

The team looked at ways to cause misclassifications of traffic signs and were able to reproduce and significantly expand upon previous research that focused on stop signs, including both targeted attacks, which aim for a specific misclassification, as well as untargeted attacks, which don’t prescribe what an image is misclassified as, just that it is misclassified. The team were successful in creating extremely efficient digital attacks which could cause misclassifications of a sign,

They used physical stickers, shown below, that model the same type of perturbations, or digital changes to the original photo, which trigger weaknesses in the classifier and cause it to misclassify the target image.

Targeted physical white-box attack on stop sign, causing custom traffic sign classifier to misclassify the stop sign as an added lane sign

This set of stickers has been specifically created with the right combination of colour, size and location on the target sign to cause a robust webcam-based image classifier to think it is looking at an “Added Lane” sign instead of a stop sign.

The team then repeated the stop sign experiments on traffic speed limit signs.
.

Physical targeted black-box attack on speed limit 35 sign resulting in a misclassification of the sign to a 45-mph sign

Black-box attack on the 35-mph sign, resulting in a misclassification of 45-mph sign. This attack also transfers on state-of-the-art CNNs namely Inception-V3, VGG-19 and ResNet-50

After testing in the lab using a high resolution webcam, the team took the technology out onto the road. A 2016 Model “S” and a 2016 Model “X” Tesla with MobilEye's EyeQ3 camera chip were tested. The adversarial stickers convinced the Tesla Head Up Display (HUD) that the speed limit was 85mph.

These adversarial stickers cause the MobilEye on Tesla Model X to interpret the 35-mph speed sign as an 85-mph speed sign

The lab tests developed attacks that were resistant to change in angle, lighting and even reflectivity to emulate real-world conditions, reducing stickers from 4 adversarial stickers in the only locations possible to confuse our webcam, all the way down to a single piece of black electrical tape, approximately 2 inches long, and extending the middle of the 3 on the traffic sign.

A robust, inconspicuous black sticker achieves a misclassification from the Tesla model S, used for Speed Assist when activating TACC (Traffic Aware Cruise Control)

Even to a trained eye, this hardly looks suspicious or malicious, and many who saw it didn’t realise the sign had been altered at all. This tiny piece of sticker was all it took to make the MobilEye camera’s top prediction for the sign to be 85 mph.

The vulnerability comes from the fact that the Tesla Automatic Cruise Control (TACC) can use speed limit signs as input to set the vehicle speed. A software release for TACC shows that the data is fed into the Speed Assist feature, which was rolled out by Tesla in 2014.

McAfee ATR’s lead researcher on the project, Shivangee Trivedi, partnered with another vulnerability researcher and Tesla owner Mark Bereza to link the TACC and Speed Assist technologies. On approaching the hacked sign, the Tesla started speeding up to the new speed limit.

The number of tests, conditions, and equipment used to replicate and verify misclassification on this target were published by McAfee in a test matrix.

The team points out that this was achieved on an earlier versions (Tesla hardware pack 1, mobilEye version EyeQ3) of the MobilEye camera platform. A 2020 vehicle implementing the latest version of the MobilEye camera did not appear to be susceptible to this attack vector or misclassification. The newest models of Tesla vehicles do not implement MobilEye technology any longer, and do not currently appear to support traffic sign recognition.

However the vulnerable version of the camera continues to account for a sizeable installation base among Tesla vehicles.

The video of the testing is at www.mcafee.com

Wednesday, February 19, 2020

Energy efficient silicon ships for edge AI

By Nick Flaherty www.flaherty.co.uk

Eta Compute has shipped the first production version of its ECM3532 embedded AI processor.

The multicore chip uses a patented technology called Continuous Voltage Frequency Scaling (CVFS) for power consumption of microwatts for many sensing applications.

The Neural Sensor Processor (NSP) for local machine learning in always-on image and sensor applications at the edge of the Internet of Things (IoT). The self-timed CVFS architecture automatically and continuously adjusts internal clock rate and supply voltage to maximize energy efficiency for the given workload, typically 100μW.

The chip combines an ARM Cortex-M3 processor with 256KB SRAM and 512KB Flash as well as a 16b Dual MAC DSP with 96KB dedicated SRAM, both with CVFS, with flash memory, SRAM, I/O, peripherals and a machine learning software development platform. A Neural Development SDK with TensorFlow interface provides the ML model integration.

“Our Neural Sensor Platform is a complete software and hardware platform that delivers more processing at the lowest power profiles in the industry. This essentially eliminates battery capacity as a barrier to thousands of IoT consumer and industrial applications,” said Ted Tewksbury, CEO of Eta Compute. “We are excited to see the first of many applications our customers are developing come to market later this year.”

“We believe that power consumption, latency and data generation combined with RF transmission are all factors limiting many sensing applications," said Jim Feldhan, president and founder at Semico Research. "It’s great seeing Eta Compute’s platform coming into the market. Their technology is orders of magnitude more power-efficient than any other technology I have seen to date and it will certainly make AI at the edge a reality.”

“It’s exciting to see innovative products for low power machine learning being launched at tinyML where experts from the industry, academia, start-ups and government labs share the innovations to drive the whole ecosystem forward,” said Pete Warden, Google Researcher and General Co-chair of the tinyML organization.

“We are amazed by the ECM3532 and its efficiency for machine learning in sensing applications,” said Zach Shelby, CEO of Edge Impulse. “It is an ideal fit for our TinyML lifecycle solution that transforms developers’ abilities to deploy ML for embedded devices by gathering data, building a model that combines signal processing, neural networks and anomaly detection to understand the real world.”

“Himax Imaging HM01B0 and new HM0360 are among the industry’s lowest power image sensors with autonomous operation modes and advanced features to reduce power, latency and system overhead. Our image sensors can operate in sub-mW range and when paired with the low power multi-core processors such as Eta Compute’s ECM3532, developers can quickly deploy edge devices that perform image inference under 1mW,” said Amit Mittra, CTO of Himax Imaging.

The ECM3532 is packaged in a 5 x 5 mm 81 ball BGA.

EtaCompute.com