MODIFICATION
A -- Resilient Autonomous Systems (RAS) - Amendment 1
- Notice Date
- 8/1/2017
- Notice Type
- Modification/Amendment
- NAICS
- 541712
— Research and Development in the Physical, Engineering, and Life Sciences (except Biotechnology)
- Contracting Office
- Department of the Air Force, Air Force Materiel Command, AFRL/RIK - Rome, 26 Electronic Parkway, Rome, New York, 13441-4514, United States
- ZIP Code
- 13441-4514
- Solicitation Number
- BAA-AFRL-RIK-2016-0005
- Point of Contact
- Gail E. Marsh, Phone: 315-330-7518
- E-Mail Address
-
Gail.Marsh@us.af.mil
(Gail.Marsh@us.af.mil)
- Small Business Set-Aside
- N/A
- Description
- Part I-Supplemental Information and Part II-Questions and Answers related to technical requirement identified in SECTION I entitled Funding Opportunity Description SUPPLEMENTAL INFORMATION TO SECTION I -FUNDING OPPORTUNITY DESCRIPTION ATTACHMENT 2 Broad Agency Announcement (BAA) Title: Resilient Autonomous Systems (RAS) BAA Number: BAA AFRL-RIK-2016-0005 This document contains Part I-Supplemental Information and Part II-Questions and Answers related to technical requirement identified in SECTION I entitled Funding Opportunity Description. PART 1 - Autonomy Test Evaluation Environment (ATE2) Supplemental Information Document The objective of this document is to provide additional information about the Autonomy Test Evaluation Environment (ATE2) simulation software. The ATE2 is in-house developed software. This simulation software will be made available as Government Off-The-Shelf (GOTS) to all that are interested. To clarify, the two most important features of this BAA call is to demonstrate the intelligent planning (re-planning) and learning behavior of agents in denied environments (A2AD) in response to an intelligent adversary. Abbreviations: BF - Blue Force RD - Red Force Mi - Miles IADS - Integrated Air Defense System ISR - Intelligence, Surveillance and Reconnaissance EW - Electronic Warfare Simulation Environment: ATE2 is a highly-scalable, multi-agent simulation framework. It is a low-to-medium fidelity simulator that emphasizes high scalability (high number agents and assets) rather than an exacting, low-level approach to modeling platforms. The ATE2 simulation, example implementation source code, documentation, as well as the evaluation scenario will be provided to performers as GFE. Included in the documentation will be tutorials that demonstrate how to develop a new blue force agent implementation, how to configure the simulator (scenario), and how to utilize the simulator on the whole. The ATE2 software is written in Java and does not require any commercial licensing. The simulator can be interacted with via the Graphical User Interface (by a human) or programmatically using the Java API (what an agent would use). The simulation consists of a blue force and a red force made up of assets (platforms). These assets are each controlled by agents. The blue force will have one mission (ISR) and the red force will have an opposing mission (deny, degrade, and/or destroy BF over its territory). These missions will consist of a set of tasks. An example, simple mission for the blue side: 1) locate three trucks in an area defined by {x, y, width, height}; and 2) collect intelligence on building with a specific UID located at {x2, y2}. There is a 1:1 pairing of one agent for each asset (blue or red) in the simulation. Each agent will only have the ability to execute actions on the asset that it owns; as well, each agent will only be able to observe the environment based on what its own asset was able to sense. To perform well in increasingly adverse A2AD scenarios, blue force agents will need to engage in distributed planning and re-planning. Centralized planning is only possible at the beginning of the simulation, when all blue force assets located at their base would benefit from relatively reliable communications. Each time step of the simulation will represent one minute of simulation time. At the beginning of the simulation, all of the blue assets will be located at their base. However, the red assets will already be located at their desired locations, likely spread out within the bounds of the simulated environment. The blue force agents will be given the blue mission; likewise, the red force agents will be given the red mission. The following are the actions available to agents on the blue force and on the red force: Blue Force Action Set: • Move own asset to location in environment (x, y, velocity (m/s)). Where velocity must be in valid range to what is supported by the vehicle. • Send message to target asset Red Force Action Set: • Move own asset to location in environment • Fire weapon at target asset • Send message to target asset • Jam Area Blue force agents would have two Java methods available to them to act within the environment. The first would be to move their asset to another location in the environment, whereby they specify the x, y coordinates (a waypoint), as well as the desired velocity that the asset should travel at in meters per second. The second method would enable the agent to transmit a message from their asset to other assets in the environment (while taking into account COMM range, jamming effects, and so forth). Gathering intelligence from a target works in this manner: each target will have a set of intelligence strings. Any asset that is gathering intelligence on that target will be able to collect a variable number of intelligence strings per simulation time step. Further, there will be different types of intelligence strings. Thus, in order to gather on intelligence points of type X, then the blue asset must have a corresponding sensor X. Duration and proximity to the target will also have an effect on the quality of the assessment of the target. Gathering intelligence is performed passively and automatically. The agent does not need to engage or orient the sensor, or perform sensor fusion. Once the blue force asset is within range of a red force asset, it will begin gathering intelligence on it. The agents determine the state of the simulation by examining the observations about the environment made by their assets. The following describes what the agent is able to examine regarding the state of the simulation. These environment observations will be provided in JSON format. Environment Observations: • All observed platforms within detection ("vision") range: o Asset Type o Global Unique Identifier o Alignment (team number) o Location (x, y) o Orientation (0-360) o Number of Intelligence Strings (and type) • Owned asset status o Asset Type o Global Unique Identifier o Alignment (team number) o Location (x, y) o Orientation (0-360) o Velocity (m/s) o Fuel remaining (liters) o Message inbox • Other Entities (e.g., a road section) Each asset will be outfitted with a Propulsion module, Sensor module, a COMMs module, and a Weapons module (in the case of red assets only). In some cases, depending on the asset type, the asset could have multiple modules in each of the aforementioned categories. At the configuration stage of a simulation, each asset can be configured with different modules; however, once the simulation begins, the modules cannot be replaced or swapped out. By the time the agents become active, they will not have the ability to configure their asset; rather, they will have to work with what they have. The Propulsion module enables the asset to traverse through its environment. It consumes available fuel while in use. The Sensor module allows an asset to obtain intelligence on targets. The Sensor module also allows the asset to detect the presence and status of other assets (blue and red) in its detection range. The Comms module enables the agent to transmit messages/data from its asset to other assets within communications range. The format is purely arbitrary - whatever can be contained within a String object. The Weapons module enables the agent to fire a weapon (missile, bomb, etc.) at a target asset. If the Weapons module does not contain any ammunition, then it will not be able to fire. Weapon modeling is very simple. For example, say at a given time step, asset A fires a weapon at asset B. On that same time step, the simulation determines whether or not the weapon had successfully destroyed asset B (assuming the weapon would reach in one to five minutes of simulated time). The simulation does not model the trajectory of the weapon, the impact, or anything along those lines. IADS, like any other asset in the simulation, has a fixed range and a finite number of ammunition that it can fire. It will not be able to fire endlessly at targets. IADs will be part of the red force. The simulator will be provided as a JAR file along with documentation on how to set up and launch the simulator. The documentation will also detail how to develop and incorporate a new agent implementation. The source code for the actual simulator will not be provided. Blue Force (BF): The BF will consist of several classes of UAVs. Each BF asset will have a finite quantity of fuel, a finite set of sensors, and a limited range communication model to communicate with other BF assets. The communication model will be based on a Gaussian distribution. The bandwidth between assets will be correlated to a Gaussian distribution. The type of BF assets could be assumed to be quadrotors or, in general, lighter multi-rotor craft. These assets would be considered Group 1 UAVs, where their maximum weight would be twenty pounds. Speed would not exceed 100 kts or 51.44 meters per second. The BF will be given a limited set of assets with different types of sensor, comm, and propulsion payloads. Red Force (RF): A large portion of the RF will consist of stationary assets; however, some if not several will be mobile. The capabilities of the RF will consist of the following: (1) detect BF assets within sensor range, (2) jam communications within certain region, (3) shoot down an asset, (4) move to a new location, and (5) transmit a message to a friendly asset (i.e., an asset on the same team). The RF jamming capabilities have a limited effective range based on a Gaussian distribution - the shorter the distance between the RF jammer and the BF asset, the stronger the jamming effect. The BF asset will not be able to utilize its communications if it is within the effective range of a RF jamming device. The red IADS will have a finite range and finite ammunition. The IADS range is based on Gaussian distribution, more specifically the proximity between IADS and the BF asset. Examples of red force assets: intelligence targets, stationary and mobile anti-air assets, stationary and mobile jamming assets. The stationary assets will typically be more effective and have has greater range as compared to mobile units. Measurements: Regarding the measuring of agent performance, the objective of the BF will be to perform the ISR mission of some finite number of ISR tasks while minimizing loss of BF assets and resources. To support that, the following attributes will be collected by the blue force asset modeled in ATE2: Red ID, Red Asset Type, Location, and Time. In addition to the assessment of predefined ISR tasks, the BF asset may detect previously unknown RF assets and may need to utilize the right sensor, duration and/or proximity to identify it. Recording the correct location of newly-discovered (previously unknown) assets will positively benefit an agent's score. The ATE2 simulation environment will contain a network of roads. However, the adversary may or may not always utilize the network of roads for their movement. Scenario: The scenario duration is 72 hours. Scenario Table Agents Collections Red Force Capability Operating Space 10 Blue; 10 Red agents 100's No adaptation 1 Mi2 20 Blue; 20 Red agents 100's Adaptation 1 Mi2 50 Blue; 50 Red agents 1000's No adaptation 10 Mi2 >100 total agents 1000's Adaptation 10 Mi2 PART II - Question and Answers (answers are in bold): Q) What is the objective within the simulation environment? The objective is to locate targets of interest in the environment as accurately as possible (i.e., capturing the coordinates of the said targets over a period of time), reduce the expenditure of fuel, and reduce the number of assets lost during the course of the scenario. Q) What are agents expected to learn? The agents are expected to learn to coordinate as a team (or set of teams) and posture themselves in the environment so as to minimize the negative effect on the mission, regardless of what the enemy does or what surprises appear. Q) Is the adversary attack model stochastic? Yes. There will be uncertainty attached to every weapon launch. The most significant factors that determine whether or not an attack is successful will be distance and the type of weapon utilized. Q) What is the range of the threat function / sensing model? A conditional Gaussian distribution model based on the proximity to the sensor. Q) Does the adversary know the full capability of the BF? No, the red force will not know beforehand how the blue force's assets are configured. They will only know what their assets observe as the simulation progresses. Q) What does 1000's of ISR collects mean? Observations for both red and blue entities of interest (e.g., way-points, planning items, object detection, tracks). Q) What kind of controls for the UAV are available? The agent is able to control where the UAV goes in the environment by specifying waypoints (latitude, longitude). The UAV is also able to transmit and receive messages, (whereas sensors to acquire intelligence are automatic). All of these aspects are modeled at a very abstract level. Q) Are the UAV controls 2D or 3D? The simulation is 2D. For example, the modeled sensor only works in a 2D fashion (altitude isn't taken into account). However, the agent is able to specify waypoints (latitude and longitude). Q) Simulation software availability - who will it be available to? The intent is to make the Resilient Autonomous Systems (RAS) Simulator available to all that are interested. Q) The document discusses different capabilities of the UAV's - does that include different payloads with unique capabilities? Yes, different platforms may be equipped with different sensors, comm equipment, etc. Therefore, some platforms will be better suited to gathering intelligence on a particular red force asset. Q) Will the decision need to be made as to which type of payload works best for a specific target? The assets will already be pre-configured. While executing the mission, it will be up to the agents to decide amongst themselves who has the most ideal payload onboard for a particular red force target. Q) Will all of the agents for the blue force or for the red force have to be homogenous? No, this will not be required. Conceivably, a different agent could be developed for X number of differently-configured assets. Q) Will the system need to figure out how to best transmit imagery back to the command post, or do we assume that once the target is found/imaged, it counts? The agents will have to collectively determine the best way to relay intelligence back to the command post, whether that involves routing it back through a chain of blue force platforms or an individual agent carrying the intelligence back on its own. If the intelligence gathered by an asset for one reason or another doesn't make it back to base, then it will not count. Q) What if that UAV is then "shot down" - locally stored intelligence is lost, correct? Correct. If the agent in control of that particular asset had not successfully relayed any intelligence to another agent, then that intelligence is lost. That opens up the need to re-acquire that intelligence with a different, still-active blue force asset. Q) What about constraints with payloads - bandwidth, storage, etc.? How is this handled? Storage size may or may not be a consideration in the final version of the simulation, however communication bandwidth will be. It will take a certain length of time for one asset to completely transmit information to another asset. Jamming and distance will impact the success of information transmission. Q) Is it the intention that collected information makes it back to HQ as soon as possible, or is it assumed that data is not collected until assets return to base? Data is not collected until the asset carrying it gets within communication range of their base. Q) If it is assumed that we want to get data back to HQ ASAP, how much weighting is applied to the system to make it happen vs. gaining additional intelligence? It depends on your approach. Taking into consideration time constraints, the asset may immediately go back home, but miss finding another target's location (consume more fuel) ; however, if the asset doesn't send the information back home and keeps searching for a new target, it may end up getting destroyed, thereby ending up with a worse result. The point is that the agent, both collectively and individually, is going to have to weigh the risk given its current circumstances (as well as the value of the intelligence it has already obtained). Q) Are threats basic plots, and if a UAV comes near a certain threat, the UAV is "hit" or jammed? An asset is "hit" when a red asset fires a weapon that, based on some pre-established probability of success, collides with the UAV. Thus, the red asset must explicitly fire at the blue asset in order for the hit to occur. Regarding jamming, a red force asset must deliberately jam within its area - if a blue asset is within range of the jamming effect, then it will to some extent lose its ability to effectively communicate or use its sensors. The closer that the blue asset is to the red asset performing the jamming, the greater the likelihood that the blue asset will be jammed. Q) Can we use any counter - capabilities? Different Comm Freqs, Chaff, Flares, etc. No, for the COMM capability. The idea, at a high level, is that the blue assets may have alternative communication methods available that are not being jammed - this depends on how a particular asset is equipped in a given scenario (i.e., a different comm module). Chaff and flares may or may not be included in the simulation, particularly since the simulation time step will be between one minute and five minutes. Counter measures may end up being automatic, whereby a blue force asset is less likely to be shot down while it still has a store of countermeasures, and likewise more likely to be shot down when it runs out. Q) Is it assumed that in EW areas, all communications are totally jammed, or is it the intention that assets should "learn" about the jamming mechanism such that they are able to improve communications over time? The agents would ideally learn ways of operating in spite of the jamming, albeit in a likely degraded fashion. For example, agents may decide to fly outside of the range of jamming, or at least to a range at which the jamming is not as effective (thereby permitting some communication to pass through). Q) Is the intent to just show basic comms, not detailed protocols, hardware, etc.? The intent is to model comms at a very high level. From the agent's perspective, communication will be as simple as sending a message to assets within communications range with a single API call within the simulator. The agent won't have to worry about what's going on under the hood. Q) Are the comms links just a binary available vs. unavailable, or are there levels of degradation? The intent is to have levels of degradation, such that messages may be lost at a certain rate, and that individual messages may be partially garbled. Q) "Intermittent" assumes that the comms come back, so if the UAV leaves a certain area, do comms links come back? If a blue force asset leaves an area that is being jammed, then comm links will come back. That, of course, assumes the blue force assets are still within comm range of each other. Q) Does jamming mean broad signal jamming, not spoofing, hacking, etc.? No spoofing, no hacking, or anything along those lines. This will all be modeled at a high level. Jamming within this simulation will be as simple as preventing communication between blue assets or the utilization of sensors. Q) Along those lines, are communications assumed to be perfectly secure or should assets only attempt to communicate when they assume they can do so without compromising data? For the sake of simplicity, communications are assumed to be perfectly secure. That is, we assume some other component has taken care of comm security for us. Q) Is there a weight against transmissions? For instance, is it assumed that the adversary will attempt to track blue force assets based on their radiated emissions? Yes. The more that an asset attempts to communicate, both in frequency and amplitude, the more that the asset may expose itself to detection. Q) Explain learning between scenarios - is that during the overall mission, or in successive missions? The same simulated scenario will be executed several times (hundreds, perhaps thousands of runs). The idea is that the agent would ideally learn from its experience in one run and perform better in a future run (even when presented with a different scenario configuration). This implies that there should be some randomness rearrangement or reshuffling of RF. Q) How much does the scenario and set of goals change between runs? The scenario goals would not change in a single set of runs (i.e., the initial conditions would be the same at the start of each run). However, scenarios may differ from each other in terms of: size of the environment, the number of blue/red assets and the type of assets (as well as their configuration), and the starting location for each of the assets. Q) After one evaluation run, can an agent share its history information about this evaluation run with other agents to improve the learning policy for future evaluation runs? Yes. Q) Between evaluation runs, will the evaluation results (as ground truth) be available from previous runs to improve the policy for future evaluation runs? Yes. Q) Is the system supposed to "learn" about the scenario, or rather about the adversary? The aim of this effort is to develop/train agents that are more resilient in the decisions that they make, such that deviations between what is expected and what actually occurs in the environment (e.g., a surprise adverse event) have less of a negative impact on mission success. The goal is to preserve as much mission effectiveness as possible in most, if not all cases. Thus, if an agent that performs well in one scenario, it should still perform similarly well in a differently-configured scenario. It is not useful if the agent is so narrowly focused that small modifications to the scenario effectively disrupt its ability to perform the mission. Q) What will be provided to the performers in way of the simulation software? Performers will be given the compiled version of the ATE^2 simulation engine (packed in a Java ARchive file), along with an API that describes how to set up the simulation, how to develop a new agent and add it to the simulation, how to execute the simulation, and how to obtain the results (ground truth) of a simulation run. The source code for the simulator will not be released. Q) Will sample red forces be part of the ATE^2 GFE? Yes. A sample red force AI (agents) will be provided to control the red force assets in the simulation. The initial red force, understandably, will be very simplistic and not adaptive. Q) At what level of fidelity are the red forces modeled? The red force assets will be modeled at the same level of fidelity as the blue force assets (i.e., low fidelity, very abstract). Q) What kinds of adversary smarts can we expect to see? Anything between a simple heuristic to an adversary that will adapt to and learn against the blue force agents. Q) How are jammers modeled? Can they be targets for strike? Jammers are modeled at a very high level. While active, a red force jammer will disrupt communications and sensors for blue force assets at a given distance, where the effectiveness of the jammer would fall off as the distance between the jammer and the jammed increases. Q) How mobile are the air defense assets? The RF will have two types of air defense assets: stationary and mobile. For the RF air defense assets that are not stationary, they will move at roughly the speed of typical land vehicles. Essentially, it should be assumed that any red force assets can move anywhere within the bounds of the simulation environment if enough time is allotted. Q) Are there any red force airborne defenses? At this time, there are no plans to include airborne red force assets. There will be only ground-based red force assets. Q) How are red force targets located? In order to locate a red force target asset, the blue force asset must get within detection range, which will vary depending on the sensor equipped on each blue force asset. The closer that the blue force asset gets to the red force asset, the greater the likelihood that the blue force asset will find the red force asset. Further, once the blue force asset gets too far away from the red force asset, or if there is active jamming in the area, the blue force will lose track of the red force asset's current location. Q) How are red force assets identified by a blue force asset? A blue force asset must first get within range of the red force asset. Every tick of the simulation, a determination will be made whether or not the blue force asset had identified the red force asset. This will reveal the asset type, team alignment, and so on of the red force asset to the blue force asset. Q) How do blue force assets attack red force assets? Blue force will not attack red forces. The goal of this effort is to adapt to adversity. Q) Is cooperation among multiple assets required? Cooperation in most cases will increase the likelihood of success, particularly since red force assets will be coordinating their own efforts to achieve their opposing mission. Q) Do targets need to be tracked? Yes, some targets will need to be tracked over time, depending on the scenario configuration. The longer that the blue force has an accurate idea of where a target red force asset has moved, the better it will have performed. Q) What does it mean to "positively locate" a target? This means a blue force asset must determine type and the correct/accurate location of a target at a given point in time (the higher the accuracy, the better). Conceivably, if the blue force asset is too far away, or if the blue force asset is being affected by jamming, then it may very well believe the red force asset is at an incorrect location. Q) What kind of comms can we assume within the contested battle area? Within the contested battle area, a reasonable expectation is that there will be regions (perhaps large regions) where comms are being actively jammed by red force assets. Q) Figure 2 appears to depict a chain of comm relays. Is this a necessary aspect of the approach? It is not a necessary aspect of the approach. It is up to the agent to decide one of three things once it collects intelligence: 1) stop what it's doing and return to base, 2) keep pushing forward with the mission, or 3) coordinate with other agents to assist with getting the information/data back to base. Q) Is there a timeliness aspect for ISR reporting? It depends. Some ISR tasks will need to be completed sometime within the whole 72 hour window. However, other tasks may have a declared time window in which its completion will be useful. Q) Is there any air-to-air refueling? No. If an air asset needs to refuel, it will have to return to base. Q) What is the contemplated milestone schedule for evaluations? Four evaluations within a period of performance for the effort, generally spaced out evenly over time. Q) Do you anticipate there being hardware in the loop? In the future, perhaps, but not within the immediate scope of the Resilient Autonomous Systems project. Q) Are pop-up, air defense, and transportation targets included in the set of targets scored for coverage and accuracy? A pop up threat (air defense) would need to be dealt with or avoided - they would generally not be the focus of ISR missions. A pop-up target of opportunity (e.g., intelligence target), on the other hand, would be scored for coverage and accuracy. Q) Are transportation targets considered to be pop-ups? They could be. At the beginning of the mission, blue force assets may be told the exact (last known) locations of a transportation target. Or they could be informed that there is an area where the transportation target is believed to be situated. Or a transportation target that was unknown to blue forces at the beginning of the mission could be found within the environment (pop-up). Q) Are there other kinds of pop-up threats? In general, pop-up threats will simply be previously unknown red force assets that would present an obstacle to the blue force accomplishing their mission. These would include red force assets that could destroy or jam the blue force assets. Q) Is striking an air defense target separately scored from detecting and identifying it? The strike capability will not be scored (or add score), since the missions will be ISR-focused. Thus, destroying a target will not necessarily earn any points for an agent, however, the act of destroying a particular target may make it easier (or possible) to successfully perform intelligence-gathering on another red asset in the environment. Q) Between evaluations, what will be changed and unchanged for Red assets (e.g., initial basic parameters including number of red assets, their locations, as well as more complex parameters including red asset action policy)? Between evaluations, many aspects of the red force assets could change. Moreover, the available blue force assets may change, as well. The most significant change between evaluations, however, will be the agents that will be controlling the red force assets. Q) After one evaluation run, can an agent share its history information about this evaluation run with other agents to improve the learning policy for future evaluation runs? Yes, the agent history information from one evaluation run can be utilized in future evaluation runs. Q) Additionally, between evaluation runs, can we assume we can have evaluation results (as groundtruth) from previous runs to improve the policy for future evaluation runs? Yes, the evaluation results (ground truth) will be provided and can be used to improve the policy for future evaluation runs. In addition, the plan is to provide the source code for the red force agents. Q) What kind of connectivity model do we consider between the red force assets (one sensor detects, do others get notified)? Yes (at least for stationary assets). We can generally assume that if one red force asset is aware of something, then every red force asset will be similarly aware in a relatively short period of time. Stationary RF can be assumed to have full connectivity with other stationary assets; mobile assets might have degraded COMM (limited bandwidth in relation to proximity to other RF). Q) What do you mean by "automated baseline" when the BAA says "achieve at least an order of magnitude improvement in mission effectiveness across the measures of performance (MOP) over an automated baseline?" The "automated baseline" is considered to be a reasonable rendition of what we could be capable of today, using automatic approaches as opposed to truly autonomous ones.
- Web Link
-
FBO.gov Permalink
(https://www.fbo.gov/spg/USAF/AFMC/AFRLRRS/BAA-AFRL-RIK-2016-0005/listing.html)
- Record
- SN04607171-W 20170803/170801232404-1471f1425d1c961bbfcdf49dcbd6ea36 (fbodaily.com)
- Source
-
FedBizOpps Link to This Notice
(may not be valid after Archive Date)
| FSG Index | This Issue's Index | Today's FBO Daily Index Page |