* clean up achievements; fix value accrual time; report flows better
* use pause, remove value accrual time
* make clients sleep correct time, add more speed and pausing methods to instance, add tests
* server adminlist
* clean up code, add more Instance methods, render pause message, tests passing'
* add tests for elapsed ticks
* fix run_eval
* game control;
* tests
* tests
* task info
* game control and medium electric poles
* connect_entities and place_entity_next_to edits
* change prints, max achieved throughput
* ast fixes - augmented assignment and some others
* changes for get_entities (groups) and connect_entities error logging;
* better connect entities behavior for no new entities placed, better grouped entity behavior, better error messages'
* fixes for tests
* item-on-ground, grouped entities
* updated tests
* ast tests and some other tweaks, all tests passing
* add connect tests
* remove analysis directory
* reward override, prep for evals
* clean up achievements; fix value accrual time; report flows better
* use pause, remove value accrual time
* make clients sleep correct time, add more speed and pausing methods to instance, add tests
* server adminlist
* clean up code, add more Instance methods, render pause message, tests passing'
* add tests for elapsed ticks
* fix run_eval
* game control;
* tests
* tests
* task info
* game control and medium electric poles
* change prints, max achieved throughput
* sessions based
* try out caching + no sleep
* update fixture usage
* better reset usge
* state less on tech, probably breaking change
* better fixtures + decouple resets
* use pytest-xdist w 2 servers
* using diff grouping for dep
* formatting
* formatting
* caching for image
* formatting
* formatting
* use uv
* use uv caching
* remove docker caching (its slower)
* how about 4 workers?
* no redundant resets
* parameterize
* change names
* update all_technologies_researched usage
change log:
- used uv and cache dependencies
- used 2 factorio headless server instances
- added pytest-xdist & used 2 pytest workers
- parametrized the slowest test -- `test_sleep.py` so as to balance it across workers
- clarified resets in `instance.py` so separate instances arent needed for research testing
- better fixture usage, with autouse reset
- added configure_game callback for per test file setup of inventories & research state.
- updated task abc all_technologies_researched usage, its now a param for reset
- using 4 workers instead of 2, can probably double it again lol
- pytest parameterized a slow test
- fixed redundant reset in conftest
final speedup: 9m 4s -> 1m, ≈9.07× faster
merging now because main is broken without it.
* registry.py changes to dataclass
* Flatten JSON task definitions and update registry
- Remove config wrapper from all task definition JSON files
- Move all config fields to top level alongside task_type and num_agents
- Update registry.py to read flattened structure
- Applied to lab_play/, multiagent/, and unbounded/ directories
* Fix remaining config reference in get_environment_info
- Update get_environment_info to use flattened task_data structure
- Remove reference to task_data['config'] which no longer exists
* Fix TaskFactory to work with flattened JSON structure
- Remove dependency on config wrapper in task JSON files
- Extract task config by filtering out task_type and num_agents
* Aug 14, 2025 at 13:15
* retain scope
* undo changes
* add back dataclass
* split scopes
* checkpoint
* intermediate
* more changes
* Aug 20, 2025 at 18:13
* model_dump
* Aug 20, 2025 at 18:27
* task_type
* first iteration
* change to support openai api endpoints
* Refactor APIFactory to use OpenAI-compatible endpoints
- Unified all providers to use OpenAI client format
- Eliminated provider-specific conditional branches
- Simplified provider detection using dict ordering
- Removed unused parameters and added missing return
- 90% reduction in code complexity
* Further simplify APIFactory
- Remove redundant MODELS_WITH_IMAGE_SUPPORT array
- Use provider config supports_images instead
- Inline _prepare_messages logic
- Extract _get_reasoning_length helper
- Add missing default return
- 20+ line reduction while maintaining functionality
* removecomment
* Inline reasoning length logic
- Remove _get_reasoning_length helper method
- Inline reasoning effort logic in o1/o3 handling
- Keep code simpler and more direct
* add provider sorting for openrouter to get fastest throughput
* add nitro
* add usage tracking
* usage
* undo changes that added logging
* update config paths
* remove offset
* offset
* Aug 20, 2025 at 20:25
* fix run_idx port offset
* make sure there is keyerror if no port
* fix
* fix: remove duplicate burner mining drill line from crafting statistics
* fix: Remove duplicate 'Useful statistics' from task descriptions
- Statistics already included in system prompt from instance.get_system_prompt()
- Removed CRAFTING_STATISTICS duplication from task goal_description
- Cleaned up unused include_stats parameter in UnboundedThroughputTask
* fix: Remove goal description duplication and restore useful statistics
- Remove duplicate {goal_description} from GYM_AGENT_INSTRUCTIONS template
- Add CRAFTING_STATISTICS to agent.md for system prompt inclusion
- Goal now appears only once in Task section, not duplicated in Instructions
- Statistics are properly included in the system prompt via agent.md
* refactor changes
* add back system prompt
* replace goal description in proper palce
* remove redundant goal statement
* whitespace
* move rstrip to after string generation
* feat: Add modular system prompt architecture
Create flexible component-based system prompt generation allowing
agent designs to customize prompts based on specific needs.
Key Features:
• SystemPromptBuilder for flexible prompt composition
• Component-based architecture (task, stats, constraints, patterns)
• Agent-specific optimizations (minimal 100 chars to comprehensive)
• Separation of task logic from prompt generation
• Backward compatibility with existing systems
Components:
- TaskDefinitionComponent: Task-specific instructions
- ProductionStatisticsComponent: Crafting/production rates
- ResponseFormatComponent: Different formats (Gym, MCTS, custom)
- MultiAgentComponent: Coordination instructions when needed
- ImplementationPatternsComponent: Code examples
- ConstraintsComponent: Behavioral rules and limitations
- APIReferenceComponent: Method docs (full or summary)
Changes:
- Enhanced ThroughputTask with build_system_prompt()
- Enhanced FactorioInstance with get_api_documentation()
- New fle.env.system_prompt package with builders and examples
Addresses need for customizable system prompts based on agent design,
similar to how observations can be tailored per agent type.
* gpt-5
* set_speed
* from tests
* set_speed
* gym registry: Uses instance_id parameter and direct indexing: tcp_ports[instance_id]
run_eval: Passes instance_id=run_idx to gym.make()
config: Added instance_id field to track which container to use
* Fix RCON client disconnection by eliminating duplicate gym.make() calls
- **Root Cause**: Two gym.make() calls were creating separate FactorioInstance objects
trying to connect to the same container, causing RCON conflicts
- **Problem**:
- Main process: gym.make() → creates FactorioInstance → connects to container
- Subprocess: gym.make() → creates ANOTHER FactorioInstance → conflicts!
- **Solution**: Eliminate main process gym.make() by:
- Getting task directly via TaskFactory.create_task()
- Generating system prompts via SystemPromptGenerator
- Only subprocess creates gym environment with correct instance_id
- **Changes**:
- registry.py: Added instance_id parameter to make_factorio_env()
- run_eval.py: Removed main process gym.make(), kept subprocess gym.make()
- config.py: Added instance_id field to track container mapping
- **Result**: Each subprocess now connects to its own container without conflicts
- run_idx=0 → container 0 (port 27000)
- run_idx=1 → container 1 (port 27001)
- run_idx=2 → container 2 (port 27002)
Fixes RCON disconnection errors in multi-container gym environments.
* Remove Path usage and eliminate redundant multiagent instructions
- **Path removal**: Replace Path() with os.path.join() in run_eval.py
- File path now resolves to: /home/kian/factorio-learning-environment/fle
- Eliminates Path dependency as requested
- **Redundancy fix**: Remove duplicate multiagent instructions from run_eval.py
- run_eval.py was duplicating the same multiagent logic as instance.py
- Now uses basic generator.generate('') in main process
- Proper agent-specific system prompts handled by instance.get_system_prompt(agent_idx)
- Eliminates code duplication between run_eval.py and instance.py
- **Result**: Cleaner code with single source of truth for multiagent instructions
* Fix outdated env/src paths in MCP protocol files
- Problem: MCP files were using non-existent path parent/env/src
This resolved to fle/env/protocols/env/src which does not exist
- Root cause: Legacy path structure assumption
Current: fle/env/ contains tools/, instance.py, etc.
Old assumption: fle/env/src/ never existed
- Solution: Update all MCP files to use correct path parent.parent
Before: Path(...).parent / env / src → fle/env/protocols/env/src ❌
After: Path(...).parent.parent → fle/env ✅
- Files fixed:
resources.py: 2 instances fixed
tools.py: 2 instances fixed
unix_tools.py: 2 instances fixed
Removed obsolete env.src. string replacement
- Verification: All paths now correctly point to fle/env/ with tools/ and instance.py
* Add container mapping debug prints for 7-env test verification
- Container Discovery: Shows all discovered containers with IPs and ports
- Main Process: Logs each run_idx → instance_id assignment
- Subprocess: Verifies gym.make() uses correct instance_id
- Registry: Shows which container is selected for each instance_id
- Instance: Confirms actual RCON connection details
Debug output will show:
🐳 CONTAINER DISCOVERY: Found X containers
🔍 Container details: Container 0: ip:port
🚀 MAIN PROCESS: Starting run_idx=X with instance_id=X
🎯 SUBPROCESS X: Creating gym environment with instance_id=X
🏭 REGISTRY: Creating FactorioInstance for instance_id=X
📡 REGISTRY: Selecting container X: ip:port
🔌 INSTANCE: Successfully connected to ip at tcp/port
✅ SUBPROCESS X: Connected to ip:port
This will verify the fix for RCON conflicts across 7 parallel environments.
* Container selection fixes for multi-terminal runs
- Add --instance_offset CLI flag (or FLE_INSTANCE_OFFSET env) to shift instance_id per terminal
- Normalize instance_id modulo number of containers inside registry (supports any offset)
- Keep detailed debug prints for discovery, selection, and connection
This ensures parallel runs across terminals map to distinct containers.
* Make container selection explicit via instance_id offset
- Remove automatic modulo normalization in registry
- Require valid instance_id; raise if out of range
- Keep --instance_offset (and FLE_INSTANCE_OFFSET) to compose instance_id = run_idx + offset
- Debug prints reflect explicit selection
This matches previous trajectory runner behavior and avoids unintended cross-terminal overlap.
* Centralize CLI parsing in fle/run.py
- Refactor run_eval.main to accept params (config_path, offset) and remove argparse
- Extend fle/run.py to parse --instance_offset and pass to run_eval
- Keep defaults for direct invocation of run_eval
This consolidates argument parsing in a single entrypoint as requested.
* Fix CLI offset parsing: use --offset in fle/run.py to pass to run_eval
* remove fluff
* remove fluff
* remove fluff
* put back things removed by mistake
* mcp: use importlib.resources.files('fle')/env instead of __file__-based execution_path; aligns with pkg-aware path used in run_eval and CLI
* run_eval: include multi-agent instructions in SystemPromptGenerator input to match instance.get_system_prompt
* unify system prompt construction: add SystemPromptGenerator.generate_for_agent(agent_idx, num_agents); use in instance.get_system_prompt and run_eval
* paths: one-line importlib.resources.files('fle')/env; unix_tools: pkg-aware tools base; instance.get_system_prompt uses generate_for_agent
* patch
* num_agents