Compare commits

..

11 Commits

Author SHA1 Message Date
Jose Luis Montañes Ojados
9610957f1b add turingcomplete cpu compiler 2026-03-03 01:18:47 +01:00
Jose Luis Montañes Ojados
d8b4f9b2ea Add class support with constructors, fields, and methods 2026-02-19 04:24:44 +01:00
Jose Luis Montañes Ojados
f2e90efc16 Add user-defined functions with call frames
Implement fn/return across the full pipeline:
- Lexer: TOK_FN, TOK_RETURN keywords
- Parser: NODE_FN_DEF, NODE_RETURN AST nodes
- Compiler: FunctionEntry table, inline compilation with jump-over and backpatching
- VM: CallFrame stack with variable snapshot for scoped calls and OP_RETURN
2026-02-18 03:16:54 +01:00
Jose Luis Montañes Ojados
da9bb6ca62 Complete VM parity with eval and fix operator precedence
VM: add string concatenation in OP_ADD, len() built-in, multi-arg
print/println, undefined variable detection, and GC via OP_NOP.
Parser: fix operator precedence by splitting into parse_expr (+,-)
and parse_term (*,/) so 8 + 2 * 4 = 16 instead of 40.
Compiler: emit OP_NOP at start of NODE_BLOCK to trigger GC.
2026-02-18 02:26:44 +01:00
Jose Luis Montañes Ojados
4442886afa Add bytecode VM backend (compile AST to bytecodes + stack-based VM)
New execution mode: ./run vm <file.j> compiles AST to bytecodes and
runs them in a while/switch loop. Ints/floats live on the stack (no
heap allocation), ~7.7x faster than the tree-walking interpreter.

Implements: opcodes, compiler with backpatching (if/while), stack VM
with arithmetic, comparisons, variables, strings, and print/println.
Reorganizes backend into src/backend/eval/ and src/backend/bytecode/.
2026-02-18 01:01:22 +01:00
Jose Luis Montañes Ojados
2c91cbb561 Add // line comments, grouped parentheses, and NODE_NOP
- Parse // comments in parse_statement by consuming tokens until newline
- Add NODE_NOP for no-op statements (comments)
- Support grouped expressions with parentheses in parse_term
2026-02-16 22:41:52 +01:00
Jose Luis Montañes Ojados
21efb0563b Allow function calls in expressions and add len() built-in
- Parse function calls in parse_term() so they work inside expressions
  (e.g. z = len(x), y = len(x) + 1)
- Add len() built-in for string length in evaluator
2026-02-16 18:31:39 +01:00
Jose Luis Montañes Ojados
a36e52a9c3 Replace print/println keywords with generic function call mechanism
- Add NODE_CALL with name, args, and arg_count to parser
- Add TOK_COMMA token and tokenize (, ), , in lexer
- Remove TOK_PRINT/TOK_PRINTLN keywords; print/println are now regular
  identifiers resolved as built-in functions in the evaluator
- Add NODE_CALL debug output in ast_print
2026-02-16 18:14:39 +01:00
Jose Luis Montañes Ojados
dd67537598 Add string type support: literals, concatenation, println, and if eval
- Implement string literal tokenization and parsing (lexer + parser)
- Add string concatenation with + operator in evaluator
- Add println keyword for printing with newline
- Add NODE_IF evaluation in VM
- Fix null terminator bug in string concat buffer
2026-02-16 17:40:12 +01:00
Jose Luis Montañes Ojados
667e5564b8 refactor(allocator): use memset when creating allocator 2026-02-16 05:16:41 +01:00
Jose Luis Montañes Ojados
65220f88c6 Add if statements, unary minus, and fix GC safe points
- Lexer: recognize 'if' as keyword (TOK_IF)
- Parser: add NODE_IF with if_statement union, parse if/cond/body,
  handle unary minus in parse_term as 0 - expr
- Eval: add NODE_IF evaluation, move GC to NODE_BLOCK level to avoid
  destroying temporary values during sub-expression evaluation
2026-02-16 05:12:28 +01:00
31 changed files with 3254 additions and 164 deletions

312
docs/todo.md Normal file
View File

@@ -0,0 +1,312 @@
# TODO: Backend mycpu_v2 para j-lang
Roadmap de implementación del generador de código para el CPU v2 de 16-bit.
El objetivo es que `gencode.h` tome el AST del frontend y produzca:
- **Texto ensamblador** legible (para debug)
- **Binario** (array de bytes para cargar en PROM)
---
## Referencia rápida del CPU v2
```
Instrucción: 8 bytes = [OPCODE:16][PARAM1:16][PARAM2:16][TARGET:16]
PC cuenta en words de 16-bit → instrucción N está en PC = N × 4
Registros libres: REG0-REG11 (12 registros, 16 bits cada uno)
Registros especiales: REG12(RAM_VAL), REG13(RAM_ADDR), REG14(PC), REG15(I/O)
Modos de direccionamiento (codificados en el opcode):
base + 0x00 = registro, registro
base + 0x40 = inmediato, registro
base + 0x80 = registro, inmediato
base + 0xC0 = inmediato, inmediato
RAM:
Leer: Escribir dirección en REG13 → REG12 se actualiza automáticamente
Escribir: REG12 = valor, REG13 = addr → RSTR (0x18)
```
### Pseudo-instrucciones útiles
```asm
MOV #valor, REGn → ADD #valor, #0, REGn (opcode 0xC0)
MOV REGa, REGb ADD REGa, #0, REGb (opcode 0x80)
JMP #addr → EQ #0, #0, addr (opcode 0xD0, siempre true)
NOP ADD #0, #0, REG0 (opcode 0xC0)
```
---
## Convención de registros
```
REG0-REG3 → Temporales para evaluación de expresiones (expression stack)
REG4-REG5 → Auxiliares (spill de expresiones profundas)
REG6-REG11 → Libres / reserva futura (frame pointer, etc.)
REG12 → RAM VALUE (especial, no tocar directamente)
REG13 → RAM ADDR (especial, no tocar directamente)
REG14 → PC (especial)
REG15 → I/O (especial)
```
## Almacenamiento de variables
Todas las variables en **RAM**. Tabla nombre→dirección en tiempo de compilación.
```asm
; Leer variable 'x' (dirección addr) → REGn
ADD #addr, #0, REG13 ; REG13 = addr
ADD REG12, #0, REGn ; REGn = RAM[addr] (lectura automática)
; Escribir variable 'x' (valor en REGn, dirección addr)
ADD #addr, #0, REG13 ; REG13 = addr
ADD REGn, #0, REG12 ; REG12 = valor
RSTR ; RAM[addr] = REG12
```
---
## Fase 0: Infraestructura del emisor
**Objetivo**: Estructuras y funciones base para emitir instrucciones.
- [ ] Struct `Instruction` (opcode, param1, param2, target)
- [ ] Buffer de instrucciones (array dinámico donde se acumulan)
- [ ] Función `emit(opcode, p1, p2, target)` — agrega instrucción al buffer
- [ ] Tabla de variables: mapeo nombre→dirección_RAM (compilación)
- [ ] Función `lookupOrCreateVar(name)` — busca o asigna dirección RAM
- [ ] Sistema de labels/backpatching:
- [ ] `emitPlaceholder()` → emite instrucción con target=0, retorna índice
- [ ] `patchTarget(index, target)` → rellena el target de instrucción emitida
- [ ] `currentAddr()` → posición actual (nº de instrucción)
- [ ] Output ASM: recorrer buffer → texto legible con mnemonicos
- [ ] Output binario: recorrer buffer → array de bytes (8 bytes/instrucción)
**Criterio**: Emitir instrucciones hardcoded, ver texto ASM y binario generados.
---
## Fase 1: Constantes, asignaciones y print
**Objetivo**: Compilar `x = 42` y `print x`.
- [ ] Compilar `NODE_INT_LIT` → cargar inmediato en REG[depth]
```asm
ADD #42, #0, REG0 ; MOV #42, REG0
```
- [ ] Compilar `NODE_ASSIGN` → evaluar expr → REG0, store en RAM
```asm
; (resultado ya en REG0)
ADD #addr, #0, REG13 ; REG13 = dirección de variable
ADD REG0, #0, REG12 ; REG12 = valor
RSTR ; RAM[addr] = valor
```
- [ ] Compilar `NODE_VAR` → leer de RAM a REG[depth]
```asm
ADD #addr, #0, REG13 ; REG13 = dirección
ADD REG12, #0, REG0 ; REG0 = RAM[addr]
```
- [ ] Compilar `NODE_PRINT` → evaluar expr → REG0, copiar a I/O
```asm
ADD REG0, #0, REG15 ; OUTPUT = REG0
```
- [ ] Compilar `NODE_BLOCK` → iterar y compilar cada statement
**Test**: `simple.j` (x = 10). `print 10` → escribe 10 en REG15.
---
## Fase 2: Expresiones aritméticas
**Objetivo**: Compilar `x = 10 + 20 * 3`.
**Estrategia**: Register depth counter. Cada sub-expresión deposita resultado en `REG[depth]`.
- [ ] Variable `int reg_depth = 0` para tracking
- [ ] Compilar `NODE_BINOP`:
```
compilar left → resultado en REG[depth]
depth++
compilar right → resultado en REG[depth]
depth--
emit OP REG[depth], REG[depth+1], REG[depth]
```
- [ ] Manejar profundidad > 4 → PUSH/POP al stack (spill)
- [ ] Mapeo de operadores:
- `+` → ADD (0x00)
- `-` → SUB (0x01)
- `*` → MUL (0x02)
- `/` → DIV (0x03)
**Test**: `sum.j`, `resta.j`. Verificar que `2 + 3 * 4` da 14.
---
## Fase 3: Comparaciones y control de flujo
**Objetivo**: Compilar `if` y `while`.
### if
- [ ] Compilar `NODE_IF`:
```
compilar condición left → REG0
compilar condición right → REG1
emit CONDICIONAL_INVERSO REG0, REG1, [placeholder]
compilar bloque then
patch placeholder → currentAddr() × 4
```
- [ ] Mapeo de condicionales **inversos** (saltar si la condición es FALSA):
- `==` en AST → emit `NEQ` (0x11)
- `!=` en AST → emit `EQ` (0x10)
- `<` en AST → emit `GRE` (0x15) — saltar si >=
- `>` en AST → emit `LSE` (0x13) — saltar si <=
### while
- [ ] Compilar `NODE_WHILE`:
```
loop_start = currentAddr()
compilar condición left → REG0
compilar condición right → REG1
emit CONDICIONAL_INVERSO REG0, REG1, [placeholder_exit]
compilar cuerpo
emit EQ #0, #0, (loop_start × 4) ; JMP incondicional
patch placeholder_exit → currentAddr() × 4
```
### Recordar
- **PC = instrucción_index × 4** (cada instrucción = 4 words de 16-bit)
- El salto incondicional es `EQ #0, #0, target` (0xD0, siempre true)
**Test**: `if.j`, `while.j`. While que cuenta de 0 a 10.
---
## Fase 4: Funciones (CALL/RET)
**Objetivo**: Compilar `fn` definitions y llamadas.
### Convención de llamada
```
1. Caller pushea argumentos al stack (derecha a izquierda)
2. Caller ejecuta CALL #dirección (pushea PC+1 al stack, salta)
3. Callee popea argumentos → variables locales en RAM
4. Callee ejecuta cuerpo
5. Callee deja resultado en REG0
6. Callee ejecuta RET (popea PC del stack, salta)
7. Caller usa REG0 como valor de retorno
```
### Tareas
- [ ] Compilar `NODE_FN_DEF`:
```
emit JMP [placeholder_skip] ; saltar sobre el cuerpo
fn_addr = currentAddr()
para cada param (de derecha a izq):
emit POP → REGn
store REGn → RAM[param_addr]
compilar cuerpo
emit RET
patch placeholder_skip → currentAddr() × 4
registrar fn_name → fn_addr en tabla de funciones
```
- [ ] Compilar `NODE_CALL`:
```
para cada argumento (de izq a der):
compilar argumento → REG0
emit PUSH REG0
emit CALL #(fn_addr × 4)
; resultado queda en REG0
```
- [ ] Compilar `NODE_RETURN`:
```
compilar expresión → REG0
emit RET
```
- [ ] Resolver scope de variables locales:
- **Opción simple**: cada función tiene su propio rango de RAM
- **Opción avanzada**: frame pointer (registro base + offset para locales)
**Test**: `functions.j`, `custom_fn.j`.
---
## Fase 5: Strings y objetos (avanzado)
**Objetivo**: Soportar strings, clases, campos e instancias.
- [ ] Strings en RAM — caracteres consecutivos, variable apunta a dirección base
- [ ] Print de strings — loop: leer cada char de RAM → escribir en REG15
- [ ] Instancias — bloque de RAM con campos, variable apunta a base
- [ ] Campos — offset fijo desde base de instancia
- [ ] Métodos — funciones con `self` (dirección de instancia) como primer arg
- [ ] Constructor — reservar espacio en RAM, llamar a `init`
**Nota**: Requiere un allocator en runtime para reservar memoria dinámica en RAM.
### Estrategia de allocator en runtime
Hay dos opciones, de menor a mayor complejidad:
**Opción A: Bump allocator (recomendado para empezar)**
La más simple. Una dirección de RAM fija (ej: `RAM[0x00FF]`) actúa como "heap pointer" que empieza al final de las variables estáticas. Cada asignación avanza el pointer. No tiene `free`.
```asm
; alloc(size) — size en REG1, retorna dirección en REG0
ADD #0x00FF, #0, REG13 ; REG13 = dirección del heap_ptr
ADD REG12, #0, REG0 ; REG0 = heap_ptr actual (dirección a retornar)
ADD REG12, REG1, REG12 ; REG12 = heap_ptr + size
RSTR ; guardar nuevo heap_ptr en RAM[0x00FF]
; REG0 = dirección del bloque asignado
```
~4 instrucciones. Suficiente para strings literales y concatenaciones simples.
**Opción B: Allocator con metadata (como `allocator.h`, pero en ASM del CPU v2)**
Mismo diseño conceptual que `src/memory/allocator.h` pero implementado como rutina en ensamblador del CPU v2:
- Cada bloque en RAM: `[size:16][in_use:16][payload...]`
- Loop que recorre bloques con comparaciones + saltos (first-fit)
- `free` marca `in_use = 0`
- ~30-50 instrucciones del CPU v2
Solo necesario si se van a liberar strings (reasignar variables string, concatenaciones temporales).
**Recomendación**: Empezar con bump allocator. Si más adelante se necesita `free`, implementar opción B usando el diseño de `allocator.h` como referencia conceptual.
**Test**: `str.j`, `classes.j`.
---
## Diagrama de dependencias
```
Fase 0 (infraestructura)
└── Fase 1 (constantes, asignación, print)
└── Fase 2 (aritmética)
└── Fase 3 (if/while)
└── Fase 4 (funciones)
└── Fase 5 (strings/objetos)
```
## Verificación por fase
| Fase | Archivos de test |
|------|-----------------|
| 1 | `simple.j` |
| 2 | `sum.j`, `resta.j` |
| 3 | `if.j`, `while.j` |
| 4 | `functions.j`, `custom_fn.j` |
| 5 | `str.j`, `classes.j` |
Comparar output generado (ASM + binario) con lo que la VM produce para la misma entrada.

103
mycpu.md Normal file
View File

@@ -0,0 +1,103 @@
# Motivación
A partir de completar el juego "Turing Complete" y tener un CPU de 8bit con arquitectura LEG funcional, me dispongo a crear un compilador para mi lenguaje "J-LANG".
# Estructura del CPU
Arquitectura: 8bit
PROM: 256 bytes
RAM: 256 bytes
STACK: 256 bytes
## Direcciones
0x00 REG0
0x01 REG1
0x02 REG2
0x03 REG3
0x04 REG4 | RAM VALUE
0x05 REG5 | RAM ADDR PTR
0x06 PROGRAM COUNTER
0x07 INPUT/OUTPUT
### Registros basicos
Desde: 0x00
Hasta: 0x03
Son registros que almacenan 1 byte.
### Registros conectados a la RAM
Estos registros hacen de "puente" con la ram.
0x05 REG5 es el counter de la ram.
0x04 REG4 es el valor que contiene RAM[counter(0x05)]
### input/output
Son 1 byte de entrada y otro de salida para interactuar con el juego "Turing Complete"
## OPCODES
Las instrucciones de este cpu tienen que tener un tamaño de 4 bytes cada una.
Se permiten dos modos de direccionamiento, desde un registro o modo inmediato.
- Desde registro: Se usa el byte para indicar el registro que guarda el valor
- Inmediato: Se usa el byte como valor directamente
La estructura de una instrucción es:
[OPCODE] [INPUT0] [INPUT1] [TARGET]
El ultimo byte "target" indica en que registro debe guardarse el resultado de la instruccion.
En los opcodes condicionales, el ultimo byte (TARGET) indica el valor que se escribirá en el PROGRAM_COUNTER si se cumple la condicion.
======== ALU ========
0x00 ADD r0 r1 t0
0x01 SUB r0 r1 t0
0x02 AND r0 r1 t0
0x03 OR r0 r1 t0
0x04 NOT r0 r1 t0
0x05 XOR r0 r1 t0
---------------------
0x40 ADD r0 #1 t0 ; # significa inmediato
0x41 SUB r0 #1 t0
0x42 AND r0 #1 t0
0x43 OR r0 #1 t0
0x44 NOT r0 #1 t0
0x45 XOR r0 #1 t0
--------------------
0x80 ADD #0 r1 t0
0x81 SUB #0 r1 t0
0x82 AND #0 r1 t0
0x83 OR #0 r1 t0
0x84 NOT #0 r1 t0
0x85 XOR #0 r1 t0
--------------------
0xC0 ADD #0 #1 t0
0xC1 SUB #0 #1 t0
0xC2 AND #0 #1 t0
0xC3 OR #0 #1 t0
0xC4 NOT #0 #1 t0
0xC5 XOR #0 #1 t0
====== CONDITIONAL ======
0x30 EQ r0 r1 pc ; equal
0x31 NEQ r0 r1 pc ; not_equal
0x32 LS r0 r1 pc ; less
0x33 LSE r0 r1 pc ; less_or_equal
0x34 GR r0 r1 pc ; greater
0x35 GRE r0 r1 pc ; greater_or_equal
======== RAM ========
0xE0 RAM_ST ?? ?? ?? ; store value in REG4 in REG5 position RAM[REG5] = REG4
0xE1 RAM_LD ?? ?? ?? ; no es usa en realidad, la ram siempre está haciendo output en la direccion 0x04
====== STACK ======
0x22 PUSH r0 ?? t0 ; ?? no se usa pero debe estar, t0 sobrescribe dicha direccion a 0
0x23 POP ?? ?? t0
0xE2 PUSH #0 ?? t0
0xE3 POP ?? ?? t0
====== FUNCTIONS ======
0x08 CALL r0 ?? ?? ; push de pc+1 en stack y setea pc al valor que contiene r0
0x09 RET ?? ?? ?? ; pop del stack y escribe el valor en el pc
0x88 CALL #0
0x89 RET ?? ?? ??

178
mycpu_v2.md Normal file
View File

@@ -0,0 +1,178 @@
# Especificaciones
Arquitectura: 16bit
Tamaño de instruccion: 16bit
PROM: Ilimitado
RAM: 1kB - 20kB
STACK: 256 - Ilimitado
# Registros
Cada registro puede almacenar 16 bits
| ADDR | NAME | NOTES |
| ---- | -------- | --------- |
| 0x00 | REG0 | |
| 0x01 | REG1 | |
| 0x02 | REG2 | |
| 0x03 | REG3 | |
| 0x04 | REG4 | |
| 0x05 | REG5 | |
| 0x06 | REG6 | |
| 0x07 | REG7 | |
| 0x08 | REG8 | |
| 0x09 | REG9 | |
| 0x0A | REG10 | |
| 0x0B | REG11 | |
| 0x0C | REG12 | RAM VALUE |
| 0x0D | REG13 | RAM ADDR |
| 0x0E | PC | |
| 0x0F | IN/OUT | |
# Opcodes
Las instrucciones en este CPU tienen un tamaño total de 8 bytes, es decir, 4 parametros de 16bit cada uno.
[OPCODE] [PARAM1] [PARAM2] [TARGET1]
PARAM1 y PARAM2 soportan 2 modos de direccionamiento:
- Modo registro
- Modo inmediato
TARGET1 indica el registro donde se guardará el resultado.
## ALU
| OPCODE | ADDR | PARAM1 | PARAM2 | TARGET1 | DESCRIPTION |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| ADD | 0x00 | R0 | R1 | T1 | |
| SUB | 0x01 | R0 | R1 | T1 | |
| MUL | 0x02 | R0 | R1 | T1 | |
| DIV | 0x03 | R0 | R1 | T1 | |
| AND | 0x04 | R0 | R1 | T1 | |
| OR | 0x05 | R0 | R1 | T1 | |
| NOT | 0x06 | R0 | R1 | T1 | |
| NAND | 0x07 | R0 | R1 | T1 | |
| NOR | 0x08 | R0 | R1 | T1 | |
| XOR | 0x09 | R0 | R1 | T1 | |
| XNOR | 0x0A | R0 | R1 | T1 | |
| NEG | 0x0B | R0 | R1 | T1 | |
| - | 0x0C | | | | |
| - | 0x0D | | | | |
| - | 0x0E | | | | |
| - | 0x0F | | | | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| ADD | 0x40 | #0 | R1 | T1 | |
| SUB | 0x41 | #0 | R1 | T1 | |
| MUL | 0x42 | #0 | R1 | T1 | |
| DIV | 0x43 | #0 | R1 | T1 | |
| AND | 0x44 | #0 | R1 | T1 | |
| OR | 0x45 | #0 | R1 | T1 | |
| NOT | 0x46 | #0 | R1 | T1 | |
| NAND | 0x47 | #0 | R1 | T1 | |
| NOR | 0x48 | #0 | R1 | T1 | |
| XOR | 0x49 | #0 | R1 | T1 | |
| XNOR | 0x4A | #0 | R1 | T1 | |
| NEG | 0x4B | #0 | R1 | T1 | |
| - | 0x4C | | | | |
| - | 0x4D | | | | |
| - | 0x4E | | | | |
| - | 0x4F | | | | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| ADD | 0x80 | R0 | #1 | T1 | |
| SUB | 0x81 | R0 | #1 | T1 | |
| MUL | 0x82 | R0 | #1 | T1 | |
| DIV | 0x83 | R0 | #1 | T1 | |
| AND | 0x84 | R0 | #1 | T1 | |
| OR | 0x85 | R0 | #1 | T1 | |
| NOT | 0x86 | R0 | #1 | T1 | |
| NAND | 0x87 | R0 | #1 | T1 | |
| NOR | 0x88 | R0 | #1 | T1 | |
| XOR | 0x89 | R0 | #1 | T1 | |
| XNOR | 0x8A | R0 | #1 | T1 | |
| NEG | 0x8B | R0 | #1 | T1 | |
| - | 0x8C | | | | |
| - | 0x8D | | | | |
| - | 0x8E | | | | |
| - | 0x8F | | | | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| ADD | 0xC0 | #0 | #1 | T1 | |
| SUB | 0xC1 | #0 | #1 | T1 | |
| MUL | 0xC2 | #0 | #1 | T1 | |
| DIV | 0xC3 | #0 | #1 | T1 | |
| AND | 0xC4 | #0 | #1 | T1 | |
| OR | 0xC5 | #0 | #1 | T1 | |
| NOT | 0xC6 | #0 | #1 | T1 | |
| NAND | 0xC7 | #0 | #1 | T1 | |
| NOR | 0xC8 | #0 | #1 | T1 | |
| XOR | 0xC9 | #0 | #1 | T1 | |
| XNOR | 0xCA | #0 | #1 | T1 | |
| NEG | 0xCB | #0 | #1 | T1 | |
| - | 0xCC | | | | |
| - | 0xCD | | | | |
| - | 0xCE | | | | |
| - | 0xCF | | | | |
## CONDITIONALS
En los condicionales TARGET1 representa a la direccion del PC (Program Counter) que se saltará si se cumple la condicion.
| OPCODE | ADDR | PARAM1 | PARAM2 | TARGET1 | DESCRIPTION |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| EQ | 0x10 | R0 | R1 | T1 | equal |
| NEQ | 0x11 | R0 | R1 | T1 | not equal |
| LS | 0x12 | R0 | R1 | T1 | less |
| LSE | 0x13 | R0 | R1 | T1 | less or eq |
| GR | 0x14 | R0 | R1 | T1 | greater |
| GRE | 0x15 | R0 | R1 | T1 |greater or eq|
| | 0x16 | R0 | R1 | T1 | |
| | 0x17 | R0 | R1 | T1 | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| EQ | 0x50 | #0 | R1 | T1 | equal |
| NEQ | 0x51 | #0 | R1 | T1 | not equal |
| LS | 0x52 | #0 | R1 | T1 | less |
| LSE | 0x53 | #0 | R1 | T1 | less or eq |
| GR | 0x54 | #0 | R1 | T1 | greater |
| GRE | 0x55 | #0 | R1 | T1 |greater or eq|
| | 0x56 | #0 | R1 | T1 | |
| | 0x57 | #0 | R1 | T1 | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| EQ | 0x90 | R0 | #1 | T1 | equal |
| NEQ | 0x91 | R0 | #1 | T1 | not equal |
| LS | 0x92 | R0 | #1 | T1 | less |
| LSE | 0x93 | R0 | #1 | T1 | less or eq |
| GR | 0x94 | R0 | #1 | T1 | greater |
| GRE | 0x95 | R0 | #1 | T1 |greater or eq|
| | 0x96 | R0 | #1 | T1 | |
| | 0x97 | R0 | #1 | T1 | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| EQ | 0xD0 | #0 | #1 | T1 | equal |
| NEQ | 0xD1 | #0 | #1 | T1 | not equal |
| LS | 0xD2 | #0 | #1 | T1 | less |
| LSE | 0xD3 | #0 | #1 | T1 | less or eq |
| GR | 0xD4 | #0 | #1 | T1 | greater |
| GRE | 0xD5 | #0 | #1 | T1 |greater or eq|
| | 0xD6 | #0 | #1 | T1 | |
| | 0xD7 | #0 | #1 | T1 | |
## CONTROL UNIT
| OPCODE | ADDR | PARAM1 | PARAM2 | TARGET1 | DESCRIPTION |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| RSTR | 0x18 | -- | -- | - | |
| PUSH | 0x19 | R1 | -- | - | |
| POP | 0x1A | -- | -- | T1 | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| PUSH | 0x59 | #1 | -- | - | |
## FUNCTIONS
| OPCODE | ADDR | PARAM1 | PARAM2 | TARGET1 | DESCRIPTION |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| CALL | 0x20 | R1 | -- | - | |
| RET | 0x21 | -- | -- | - | |
| HALT | 0x22 | -- | -- | - | |
| ------ | ---- | ------ | ------ | ------- | ----------- |
| CALL | 0x60 | #1 | -- | - | |

11
projects/classes.j Normal file
View File

@@ -0,0 +1,11 @@
class Dog:
fn init(self, name):
self.name = name
fn bark(self):
println("guau!")
d = Dog("ahi te va")
x = d.bark()
println("Hola ", d.name)
debugHeap()

5
projects/comment.j Normal file
View File

@@ -0,0 +1,5 @@
// Esto es un comentario
println("Hello World!")
// Otro comentario mas
println(40)

8
projects/custom_fn.j Normal file
View File

@@ -0,0 +1,8 @@
x = "Hello world!"
fn suma(x, y):
fn pow(z):
return z * z
return x + pow(y)
println(suma(2, 2))

3
projects/functions.j Normal file
View File

@@ -0,0 +1,3 @@
x = "Hello"
y = 2 * (4 - 2)
println(y)

4
projects/if.j Normal file
View File

@@ -0,0 +1,4 @@
x = 20
if x < 10:
print x
print -300

3
projects/mycpu/assign.j Normal file
View File

@@ -0,0 +1,3 @@
x = 10
y = 512
z = x + y

View File

@@ -0,0 +1,6 @@
g = 2
fn suma(x, y):
return x + y
x = suma(5, 2) - g + 1

12
projects/mycpu/ifs.j Normal file
View File

@@ -0,0 +1,12 @@
counter = 0
fn inc():
counter = counter + 1
fn main():
if counter < 30:
inc()
main()
main()

4
projects/mycpu/while.j Normal file
View File

@@ -0,0 +1,4 @@
counter = 0
while counter < 65000:
counter = counter + 1

12
projects/str.j Normal file
View File

@@ -0,0 +1,12 @@
x = 0
while x < 10:
x = x + 1
if x > 9:
println "fin"
x = "a"
y = x * 1
z = y + 2
println "a" + z

3
projects/test.j Normal file
View File

@@ -0,0 +1,3 @@
x = 5
z = x > 2
print x + z

2
projects/vm_simple.j Normal file
View File

@@ -0,0 +1,2 @@
x = 8 + 2 * 4
print(x, end="\n")

View File

@@ -1,4 +1,5 @@
x = 0 x = 0
while x < 100000000: while x < 10000000:
x = x + 1 x = x + 1
print x print(x)
debugHeap()

266
readme.md
View File

@@ -1,52 +1,240 @@
# j-lang # j-lang
La idea de j-lang es crear un "proto-lenguaje" parecido a python pero implementado desde 0 para validar y aprender más sobre la gestión de memoria. Un proto-lenguaje con sintaxis inspirada en Python, implementado desde cero en C. El objetivo es aprender sobre gestion de memoria, tokenizacion, parsing y evaluacion de un lenguaje de programacion.
Actualmente en `mem-heap\src\allocator.h` ya hay una implementeción de un Memory Allocator casi funcional. ## Estado actual
## 🗺️ Hoja de Ruta: Proyecto Proto-Lenguaje Las 5 fases del interprete estan implementadas y funcionando:
Esta ruta va desde lo más bajo (la memoria) hasta lo más alto (ejecutar código).
### Fase 1: El Cimiento (Gestión de Memoria) 🏗️ ```
Objetivo: Tener un malloc y free propios que gestionen metadatos compactos. Codigo fuente (.j)
|
[LEXER] src/frontend/lexer.h
|
Tokens
|
[PARSER] src/frontend/parser.h
|
AST
|
[EVAL] src/vm/eval.h
|
Ejecucion + GC
```
Estado: ¡Ya estás aquí! ### Que funciona
Tareas clave: - **Variables y asignacion:** `x = 10`
- [ ] Terminar CMA_malloc con la cabecera compactada (Size + Marked + InUse). - **Aritmetica:** `+`, `-`, `*`, `/` con enteros
- [ ] Implementar una función CMA_free que pueda liberar un bloque específico. - **Comparaciones:** `<`, `>`
- **Strings:** literales, concatenacion con `+`, `len()`
- **Control de flujo:** `if` y `while` con bloques indentados (estilo Python)
- **Funciones built-in:** `print()`, `println()`, `len()`
- **Llamadas a funciones** con multiples argumentos separados por `,`
- **Expresiones con parentesis:** `2 * (4 - 2)`
- **Numeros negativos:** `-300`
- **Comentarios:** `// esto es un comentario`
### Fase 2: El Modelo de Objetos (Object Model) 📦 ### Ejemplo
Objetivo: Definir cómo se ve un número, una cadena o una lista dentro de tu memoria C.
Conexión: Cada objeto de tu lenguaje será un struct en C que comienza con tu CMA_metadata.
Tareas clave: ```
- [ ] Crear un enum para los tipos (ENTERO, STRING, LISTA). x = 0
- [ ] Definir el struct Object genérico que envuelve tus datos. while x < 10:
x = x + 1
### Fase 3: El Front-End (Lexer y Parser) 📖 if x > 9:
Objetivo: Convertir el texto del código fuente en algo que C entienda. println("fin")
```
Tareas clave:
- [ ] Lexer (Tokenizador): Romper el texto x = 10 en fichas: [ID:x], [OP:=], [NUM:10].
- [ ] Parser: Organizar esas fichas en un Árbol de Sintaxis Abstracta (AST). Por ejemplo, un nodo "Asignación" que tiene un hijo "x" y otro "10".
### Fase 4: El Motor (Evaluador o VM) ⚙️
Objetivo: Recorrer el árbol y "hacer" lo que dice.
Tareas clave:
- [ ] Crear una función recursiva eval(nodo) que ejecute la lógica.
Si es un nodo SUMA, suma los hijos. Si es un nodo IMPRIMIR, muestra en pantalla.
### Fase 5: El Recolector de Basura (Garbage Collector) 🧹
Objetivo: Automatizar la limpieza.
Tareas clave:
- [ ] Implementar Mark (Marcar): Recorrer todos los objetos accesibles desde tus variables y poner el bit Marked a 1.
- [ ] Implementar Sweep (Barrer): Recorrer todo el heap linealmente (usando tu función next_block). Si un bloque tiene Marked == 0 y InUse == 1, llamar a CMA_free.
## Estructura del proyecto ## Estructura del proyecto
- vm: maquina virtual de j-lang ```
- projects: carpeta con scripts en j-lang src/
frontend/
lexer.h Tokenizador: texto -> tokens
parser.h Parser: tokens -> AST
memory/
allocator.h Memory allocator custom (heap simulado)
gc.h Garbage collector (mark-and-sweep)
objects/
object.h Modelo de objetos (int, float, string, list)
vm/
eval.h Evaluador: recorre el AST y ejecuta
main.c Punto de entrada
projects/ Scripts de ejemplo en .j
docs/
roadmap.md Roadmap detallado de implementacion
```
### Memory allocator
Heap simulado sobre un array de bytes con metadatos por bloque (`size`, `in_use`, `marked`). Soporta asignacion, liberacion, reutilizacion de bloques libres (first-fit) y crecimiento automatico cuando se queda sin espacio.
### Garbage collector
Mark-and-sweep: marca los objetos alcanzables desde las variables del environment, barre los no marcados y fusiona bloques libres contiguos.
### Modelo de objetos
Los valores del lenguaje se representan como `Object` con tagged union. Tipos soportados: `OBJ_INT`, `OBJ_FLOAT`, `OBJ_STRING`, `OBJ_LIST`, `OBJ_NONE`. Los objetos viven en el heap custom y se referencian por offset (no punteros absolutos).
## Compilar y ejecutar
```bash
gcc src/main.c -o run
./run projects/sum.j
```
---
## Roadmap: que falta para hacer un juego 2D con JLang
Para poder escribir un juego 2D tipo "mover un personaje por pantalla, disparar, colisiones" con JLang, harian falta estos bloques:
### 1. Funciones de usuario
Lo mas urgente. Sin funciones no se puede organizar nada.
```
fn update(dt):
player_x = player_x + speed * dt
fn draw():
draw_rect(player_x, player_y, 32, 32)
```
Implica: nuevo token `fn`, nodo `NODE_FUNC_DEF` en el AST, almacenar el cuerpo de la funcion en el environment, y un mecanismo de scopes (variables locales vs globales).
### 2. Return
Las funciones necesitan devolver valores.
```
fn distance(x1, y1, x2, y2):
dx = x1 - x2
dy = y1 - y2
return sqrt(dx * dx + dy * dy)
```
### 3. Structs o clases
Para representar entidades del juego (jugador, enemigos, balas...).
```
class Entity:
x = 0
y = 0
w = 32
h = 32
player = Entity()
player.x = 100
player.y = 200
```
Implica: acceso a campos con `.`, constructor, almacenar la definicion de la clase como un objeto mas en el heap.
### 4. Listas funcionales
Las listas ya existen como tipo (`OBJ_LIST`) pero no hay sintaxis para usarlas. Se necesitan para manejar colecciones de entidades.
```
enemies = [Enemy(), Enemy(), Enemy()]
append(enemies, Enemy())
i = 0
while i < len(enemies):
update(enemies[i])
i = i + 1
```
Implica: sintaxis `[...]`, acceso por indice `lista[i]`, `append()`, `len()` para listas.
### 5. Else / elif
Imprescindible para logica de juego.
```
if key == "left":
player_x = player_x - speed
elif key == "right":
player_x = player_x + speed
else:
speed = 0
```
### 6. For loops
Iterar de forma mas limpia que con `while`.
```
for enemy in enemies:
draw_rect(enemy.x, enemy.y, enemy.w, enemy.h)
```
### 7. Operadores que faltan
- `%` (modulo) - util para animaciones ciclicas, wrapping
- `==`, `!=` (ya tokenizados pero no evaluados completamente)
- `<=`, `>=`
- `and`, `or`, `not` - operadores logicos
- `+=`, `-=` - azucar sintactico
### 8. Floats funcionales
El tipo `OBJ_FLOAT` existe pero no se puede usar desde el lenguaje. Para un juego se necesita aritmetica de punto flotante para posiciones, velocidades, delta time, etc.
```
player_x = 100.0
speed = 2.5
player_x = player_x + speed * dt
```
### 9. Libreria grafica (FFI a C)
El punto critico. JLang necesita poder llamar a una libreria grafica en C como SDL2 o raylib. Hay dos caminos:
**Opcion A: Built-in functions (mas facil)**
Registrar funciones C directamente en el evaluador, como ya se hace con `print`:
```c
// En el eval, junto a print/println:
if (strcmp(name, "draw_rect") == 0) { SDL_RenderFillRect(...); }
if (strcmp(name, "key_pressed") == 0) { ... }
```
**Opcion B: FFI generico (mas ambicioso)**
Un sistema para enlazar funciones C arbitrarias desde JLang.
Las funciones minimas para un juego serian:
| Funcion | Descripcion |
|---|---|
| `create_window(w, h, title)` | Crear ventana |
| `clear()` | Limpiar pantalla |
| `draw_rect(x, y, w, h, r, g, b)` | Dibujar rectangulo |
| `draw_image(path, x, y)` | Dibujar imagen/sprite |
| `present()` | Mostrar frame |
| `key_pressed(key)` | Consultar tecla |
| `get_dt()` | Delta time entre frames |
| `random(min, max)` | Numero aleatorio |
### 10. Funciones matematicas
`sqrt()`, `sin()`, `cos()`, `abs()`, `random()`. Todas se pueden registrar como built-ins que llamen a `math.h`.
### Orden sugerido de implementacion
```
1. Funciones de usuario + return (sin esto no se puede hacer nada)
2. Else / elif
3. Floats funcionales
4. Operadores que faltan (%, <=, >=, and, or)
5. Listas con sintaxis ([], indexado, append)
6. For loops
7. Structs o clases
8. Built-ins graficos (SDL2/raylib)
9. Funciones matematicas
10. Juego 2D funcional
```
Los pasos 1-7 son trabajo puro de lenguaje (lexer/parser/eval). El paso 8 es donde JLang toca el mundo real: linkear con SDL2 o raylib a la hora de compilar y exponer las funciones como built-ins en el evaluador.

BIN
run.exe

Binary file not shown.

View File

@@ -0,0 +1,470 @@
#ifndef JLANG_COMPILER_H
#define JLANG_COMPILER_H
#include "../../frontend/parser.h"
#include "opcodes.h"
#include <string.h>
typedef struct {
char *method_name;
int entry_point;
int param_count;
char **param_names;
} MethodEntry;
typedef struct {
char *name;
MethodEntry methods[16];
int method_count;
} ClassEntry;
typedef struct {
char *name;
int entry_point; // indice de la primera instruccion
int param_count;
char **param_names; // nombres de parametros
} FunctionEntry;
typedef struct {
Instruction code[4096]; // bytecodes
int code_count;
char *constants[256]; // pool de strings literales
int const_count;
char *names[256]; // tabla de nombres (variables + funciones)
int name_count;
FunctionEntry functions[64];
int func_count;
ClassEntry classes[16];
int class_count;
} Chunk;
int emit(Chunk *chunk, Instruction instr) {
chunk->code[chunk->code_count++] = instr;
return chunk->code_count - 1;
}
int add_constant(Chunk *chunk, char *str) {
for (int i = 0; i < chunk->const_count; i++) {
if (strcmp(chunk->constants[i], str) == 0) {
return i;
}
}
chunk->constants[chunk->const_count++] = str;
return chunk->const_count - 1;
}
Instruction make_instruction(OpCode op) {
Instruction instr;
instr.op = op;
return instr;
}
int add_name(Chunk *chunk, char *name) {
for (int i = 0; i < chunk->name_count; i++) {
if (strcmp(chunk->names[i], name) == 0) {
return i;
}
}
chunk->names[chunk->name_count++] = name;
return chunk->name_count - 1;
}
int compile_node(Chunk *chunk, ASTNode *node) {
switch (node->type) {
case NODE_INT_LIT: {
Instruction instr = make_instruction(OP_CONST_INT);
instr.operand.int_val = node->data.int_val;
return emit(chunk, instr);
}
case NODE_STRING_LIT: {
Instruction instr = make_instruction(OP_CONST_STRING);
instr.operand.str_index = add_constant(chunk, node->data.string_val);
return emit(chunk, instr);
}
case NODE_VAR: {
Instruction instr = make_instruction(OP_LOAD_VAR);
instr.operand.var_index = add_name(chunk, node->data.string_val);
return emit(chunk, instr);
}
case NODE_ASSIGN: {
compile_node(chunk, node->data.assign.value);
Instruction instr = make_instruction(OP_STORE_VAR);
instr.operand.var_index = add_name(chunk, node->data.assign.name);
return emit(chunk, instr);
}
case NODE_CALL: {
// Compilar cada argumento y pushear al stack
for (int i = 0; i < node->data.call.arg_count; i++) {
compile_node(chunk, node->data.call.args[i]);
}
// Verificar si es constructor de una clase
for (int i = 0; i < chunk->class_count; i++) {
if (strcmp(chunk->classes[i].name, node->data.call.name) == 0) {
// Buscar init
for (int m = 0; m < chunk->classes[i].method_count; m++) {
if (strcmp(chunk->classes[i].methods[m].method_name, "init") == 0) {
int expected =
chunk->classes[i].methods[m].param_count - 1; // -1 por self
if (node->data.call.arg_count != expected) {
printf("error: %s() espera %d args, pero recibio %d\n",
node->data.call.name, expected, node->data.call.arg_count);
exit(1);
}
break;
}
}
break;
}
}
// Registrar el nombre de la funcion
Instruction instr = make_instruction(OP_CALL);
instr.operand.call.arg_count = node->data.call.arg_count;
instr.operand.call.name_index = add_name(chunk, node->data.call.name);
return emit(chunk, instr);
}
case NODE_BLOCK: {
int n = node->data.block.count;
// NOP for gc
emit(chunk, make_instruction(OP_NOP));
for (int i = 0; i < n; i++) {
compile_node(chunk, node->data.block.stmts[i]);
}
return 0;
}
case NODE_BINOP: {
int leftOffset = compile_node(chunk, node->data.binop.left);
int rightOffset = compile_node(chunk, node->data.binop.right);
OpCode opCode;
switch (node->data.binop.op) {
case '+':
opCode = OP_ADD;
break;
case '-':
opCode = OP_SUB;
break;
case '*':
opCode = OP_MUL;
break;
case '/':
opCode = OP_DIV;
break;
case '>':
opCode = OP_CMP_GT;
break;
case '<':
opCode = OP_CMP_LT;
break;
default:
break;
}
emit(chunk, make_instruction(opCode));
return 0;
}
case NODE_WHILE: {
int loop_start = chunk->code_count;
compile_node(chunk, node->data.while_loop.cond);
// jump if zero, zero = false
Instruction instr = make_instruction(OP_JUMP_IF_ZERO);
instr.operand.jump_target = -1;
int jump_offset = emit(chunk, instr);
// compile body
compile_node(chunk, node->data.while_loop.body);
instr = make_instruction(OP_JUMP);
instr.operand.jump_target = loop_start;
emit(chunk, instr);
// Bachpatching
chunk->code[jump_offset].operand.jump_target = chunk->code_count;
break;
}
case NODE_IF: {
// compile condition
compile_node(chunk, node->data.if_statement.cond);
// add jump if zero
Instruction instr = make_instruction(OP_JUMP_IF_ZERO);
instr.operand.jump_target = -1;
int jump_offset = emit(chunk, instr);
// compile body
compile_node(chunk, node->data.if_statement.body);
chunk->code[jump_offset].operand.jump_target = chunk->code_count;
break;
}
case NODE_RETURN: {
if (node->data.ret.value) {
compile_node(chunk, node->data.ret.value);
}
emit(chunk, make_instruction(OP_RETURN));
return 0;
}
case NODE_FN_DEF: {
// emitir jmp para ignorar la funcion por defecto
Instruction jump = make_instruction(OP_JUMP);
jump.operand.jump_target = -1; // backpatch despues
int jump_idx = emit(chunk, jump);
// registrar entrypoint de la funcion
int entry = chunk->code_count;
FunctionEntry *fn = &chunk->functions[chunk->func_count++];
fn->name = node->data.fn_def.name;
fn->entry_point = entry;
fn->param_count = node->data.fn_def.param_count;
fn->param_names = node->data.fn_def.params;
// emitir store_var para cada parametro (orden inverso al stack)
for (int i = node->data.fn_def.param_count - 1; i >= 0; i--) {
Instruction store = make_instruction(OP_STORE_VAR);
store.operand.var_index = add_name(chunk, node->data.fn_def.params[i]);
emit(chunk, store);
}
// compilar el cuerpo
compile_node(chunk, node->data.fn_def.body);
// emitir el return implicito (por si no hay return explicito)
emit(chunk, make_instruction(OP_RETURN));
// backpatch jump
chunk->code[jump_idx].operand.jump_target = chunk->code_count;
break;
}
case NODE_CLASS_DEF: {
// Registrar ClassEntry
ClassEntry *cls = &chunk->classes[chunk->class_count++];
cls->name = node->data.class_def.name;
cls->method_count = 0;
// Pre-registrar self en la tabla de nombres
add_name(chunk, "self");
int totalMethods = node->data.class_def.method_count;
for (int i = 0; i < totalMethods; i++) {
ASTNode *method = node->data.class_def.methods[i];
// jump over
Instruction jump = make_instruction(OP_JUMP);
jump.operand.jump_target = -1;
int jump_idx = emit(chunk, jump);
// Registrar method entry
int entry = chunk->code_count;
MethodEntry *me = &cls->methods[cls->method_count++];
me->method_name = method->data.fn_def.name;
me->entry_point = entry;
me->param_count = method->data.fn_def.param_count;
me->param_names = method->data.fn_def.params;
// store_var para cada parametro (orden inverso)
for (int p = method->data.fn_def.param_count - 1; p >= 0; p--) {
Instruction store = make_instruction(OP_STORE_VAR);
store.operand.var_index =
add_name(chunk, method->data.fn_def.params[p]);
emit(chunk, store);
}
// Compilar body
compile_node(chunk, method->data.fn_def.body);
// return implicito
emit(chunk, make_instruction(OP_RETURN));
// Backpatch
chunk->code[jump_idx].operand.jump_target = chunk->code_count;
}
break;
}
case NODE_DOT_ACCESS: {
compile_node(chunk, node->data.dot_access.object);
Instruction instr = make_instruction(OP_GET_FIELD);
instr.operand.var_index = add_name(chunk, node->data.dot_access.field);
emit(chunk, instr);
break;
}
case NODE_DOT_ASSIGN: {
compile_node(chunk, node->data.dot_assign.value); // push valor
compile_node(chunk, node->data.dot_assign.object); // push instancia
Instruction instr = make_instruction(OP_SET_FIELD);
instr.operand.var_index = add_name(chunk, node->data.dot_assign.field);
emit(chunk, instr);
break;
}
case NODE_METHOD_CALL: {
compile_node(chunk, node->data.method_call.object); // push instancia
for (int i = 0; i < node->data.method_call.arg_count; i++) {
compile_node(chunk, node->data.method_call.args[i]);
}
Instruction instr = make_instruction(OP_CALL_METHOD);
instr.operand.call.name_index =
add_name(chunk, node->data.method_call.method);
instr.operand.call.arg_count = node->data.method_call.arg_count;
emit(chunk, instr);
break;
}
default:
break;
}
return 0;
}
Chunk *compile(ASTNode *root) {
// Create chunk
Chunk *chunk = (Chunk *)malloc(sizeof(Chunk));
// Set arrays to 0
memset(chunk, 0, sizeof(Chunk));
compile_node(chunk, root);
Instruction instr;
instr.op = OP_HALT;
emit(chunk, instr);
return chunk;
}
void print_chunk(Chunk *chunk) {
printf("=== Names (%d) ===\n", chunk->name_count);
for (int i = 0; i < chunk->name_count; i++) {
printf(" [%d] %s\n", i, chunk->names[i]);
}
printf("=== Constants (%d) ===\n", chunk->const_count);
for (int i = 0; i < chunk->const_count; i++) {
printf(" [%d] \"%s\"\n", i, chunk->constants[i]);
}
printf("=== Bytecode (%d instructions) ===\n", chunk->code_count);
for (int i = 0; i < chunk->code_count; i++) {
Instruction instr = chunk->code[i];
printf("%04d ", i);
switch (instr.op) {
case OP_CONST_INT:
printf("CONST_INT %d", instr.operand.int_val);
break;
case OP_CONST_STRING:
printf("CONST_STRING [%d] \"%s\"", instr.operand.str_index,
chunk->constants[instr.operand.str_index]);
break;
case OP_POP:
printf("POP");
break;
case OP_ADD:
printf("ADD");
break;
case OP_SUB:
printf("SUB");
break;
case OP_MUL:
printf("MUL");
break;
case OP_DIV:
printf("DIV");
break;
case OP_NEG:
printf("NEG");
break;
case OP_CMP_LT:
printf("CMP_LT");
break;
case OP_CMP_GT:
printf("CMP_GT");
break;
case OP_LOAD_VAR:
printf("LOAD_VAR [%d] %s", instr.operand.var_index,
chunk->names[instr.operand.var_index]);
break;
case OP_STORE_VAR:
printf("STORE_VAR [%d] %s", instr.operand.var_index,
chunk->names[instr.operand.var_index]);
break;
case OP_JUMP:
printf("JUMP -> %04d", instr.operand.jump_target);
break;
case OP_JUMP_IF_ZERO:
printf("JUMP_IF_ZERO -> %04d", instr.operand.jump_target);
break;
case OP_CALL:
printf("CALL %s(%d args)", chunk->names[instr.operand.call.name_index],
instr.operand.call.arg_count);
break;
case OP_RETURN:
printf("RETURN");
break;
case OP_NOP:
printf("NOP");
break;
case OP_HALT:
printf("HALT");
break;
case OP_GET_FIELD:
printf("GET_FIELD [%d] %s", instr.operand.var_index,
chunk->names[instr.operand.var_index]);
break;
case OP_SET_FIELD:
printf("SET_FIELD [%d] %s", instr.operand.var_index,
chunk->names[instr.operand.var_index]);
break;
case OP_CALL_METHOD:
printf("CALL_METHOD %s(%d args)",
chunk->names[instr.operand.call.name_index],
instr.operand.call.arg_count);
break;
default:
printf("UNKNOWN op=%d", instr.op);
break;
}
printf("\n");
}
printf("=== User Functions ===\n");
for (int i = 0; i < chunk->func_count; i++) {
FunctionEntry *fn = &chunk->functions[i];
printf("[%.4d] %s(", fn->entry_point, fn->name);
for (int p = 0; p < fn->param_count; p++) {
printf("%s", fn->param_names[p]);
if (p < fn->param_count - 1) {
printf(", ");
}
}
printf(")\n");
}
printf("=== Classes ===\n");
for (int i = 0; i < chunk->class_count; i++) {
ClassEntry *cls = &chunk->classes[i];
printf("class %s (%d methods)\n", cls->name, cls->method_count);
for (int m = 0; m < cls->method_count; m++) {
MethodEntry *me = &cls->methods[m];
printf(" [%.4d] %s(", me->entry_point, me->method_name);
for (int p = 0; p < me->param_count; p++) {
printf("%s", me->param_names[p]);
if (p < me->param_count - 1)
printf(", ");
}
printf(")\n");
}
}
printf("=== End ===\n");
}
#endif

View File

@@ -0,0 +1,43 @@
#ifndef JLANG_OPCODES_H
#define JLANG_OPCODES_H
typedef enum {
OP_CONST_INT, // push entero inmediato
OP_CONST_STRING, // push string desde pool de constantes (alloc en heap)
OP_POP, // descarta top del stack
OP_ADD,
OP_SUB,
OP_MUL,
OP_DIV, // aritmetica
OP_NEG, // negacion unaria
OP_CMP_LT,
OP_CMP_GT, // comparacion -> push 0 o 1
OP_LOAD_VAR, // push variable por indice
OP_STORE_VAR, // pop -> guardar en variable por indice
OP_JUMP, // salto incondicional
OP_JUMP_IF_ZERO, // pop -> si false, saltar
OP_CALL, // llamar built-in por indice de nombre
OP_RETURN, // retornar de funcion (pop call frame)
OP_NOP,
OP_HALT,
OP_GET_FIELD, // TOS=instance, operand=name_idx → push field value
OP_SET_FIELD, // TOS=instance, TOS-1=value, operand=name_idx → set field
OP_CALL_METHOD, // TOS-N-1=instance + N args, operand={name_idx, arg_count}
} OpCode;
typedef struct {
OpCode op;
union {
int int_val; // OP_CONST_INT
int str_index; // OP_CONST_STRING: indice a pool de constantes
int var_index; // OP_LOAD_VAR, OP_STORE_VAR
int jump_target; // OP_JUMP, OP_JUMP_IF_ZERO
struct {
int name_index;
int arg_count;
} call; // OP_CALL
} operand;
} Instruction;
#endif

View File

@@ -0,0 +1,26 @@
#ifndef JLANG_VALUE_H
#define JLANG_VALUE_H
#include "opcodes.h"
#include <stdlib.h>
typedef enum
{
VAL_INT,
VAL_FLOAT,
VAL_OBJ,
VAL_NONE,
} ValueType;
typedef struct
{
ValueType type;
union
{
int int_val;
double float_val;
size_t heap_offset; // para strings, listas
} as;
} Value;
#endif

518
src/backend/bytecode/vm.h Normal file
View File

@@ -0,0 +1,518 @@
#ifndef JLANG_VM_H
#define JLANG_VM_H
#include "../../memory/gc.h"
#include "compiler.h"
#include "value.h"
typedef struct {
int return_ip; // a donde volver
int saved_sp; // base del stack
Value saved_vars[256]; // variables del caller (snapshot)
int saved_var_set[256];
int is_constructor;
Value constructor_instance;
} CallFrame;
typedef struct {
Chunk *chunk;
int ip; // instruction pointer
Value stack[1024];
int sp; // stack pointer
Value vars[256]; // variables por indice
int var_set[256]; // 0=no definida, 1=definida
CallFrame frames[64];
int frame_count;
JLANG_memory_allocator *allocator;
} VM;
void run_vm(VM *vm) {
while (1) {
Instruction instr = vm->chunk->code[vm->ip];
switch (instr.op) {
case OP_HALT:
// Stop vm
return;
case OP_JUMP: {
// Go to instruction
vm->ip = instr.operand.jump_target;
continue;
}
case OP_JUMP_IF_ZERO: {
// pop from stack
Value var1 = vm->stack[--vm->sp];
if (var1.as.int_val == 0) {
vm->ip = instr.operand.jump_target;
continue;
}
break;
}
case OP_CONST_INT: {
// push value to stack
Value v = {0};
v.type = VAL_INT;
v.as.int_val = instr.operand.int_val;
vm->stack[vm->sp++] = v;
break;
}
case OP_CONST_STRING: {
// Create obj
size_t strOffsetHeap = obj_new_string(
vm->allocator, vm->chunk->constants[instr.operand.str_index]);
// Push to stack
Value v = {0};
v.type = VAL_OBJ;
v.as.heap_offset = strOffsetHeap;
vm->stack[vm->sp++] = v;
break;
}
case OP_STORE_VAR: {
// pop del stack
Value v = vm->stack[--vm->sp];
int idx = instr.operand.var_index;
// store vm->vars and mark vm->var_set
vm->vars[idx] = v;
vm->var_set[idx] = 1;
break;
}
case OP_LOAD_VAR: {
// get from vm->var
int idx = instr.operand.var_index;
if (!vm->var_set[idx]) {
printf("error: variable '%s' no definida\n", vm->chunk->names[idx]);
return;
}
Value v = vm->vars[idx];
// push to stack
vm->stack[vm->sp++] = v;
break;
}
case OP_CALL: {
int nameIdx = instr.operand.call.name_index;
char *name = vm->chunk->names[nameIdx];
// check if is an user function
FunctionEntry *fn = NULL;
for (int i = 0; i < vm->chunk->func_count; i++) {
if (strcmp(vm->chunk->functions[i].name, name) == 0) {
fn = &vm->chunk->functions[i];
break;
}
}
if (fn != NULL) {
// Guardar estado actual en call frame
CallFrame *frame = &vm->frames[vm->frame_count++];
frame->return_ip = vm->ip + 1; // volver a la siguiente instruccion
frame->saved_sp = vm->sp - fn->param_count;
frame->is_constructor = 0;
memcpy(frame->saved_vars, vm->vars, sizeof(vm->vars));
memcpy(frame->saved_var_set, vm->var_set, sizeof(vm->var_set));
// Saltar al entrypoint
vm->ip = fn->entry_point;
continue; // no hacer ip++
}
// Buscar en classes[]
ClassEntry *cls = NULL;
int class_idx = -1;
for (int i = 0; i < vm->chunk->class_count; i++) {
if (strcmp(vm->chunk->classes[i].name, name) == 0) {
cls = &vm->chunk->classes[i];
class_idx = i;
break;
}
}
if (cls != NULL) {
// alloc instancia
size_t instOffset =
obj_new_instance(vm->allocator, class_idx, 8, sizeof(Value));
Value instVal = {0};
instVal.type = VAL_OBJ;
instVal.as.heap_offset = instOffset;
// buscar init
MethodEntry *init = NULL;
for (int m = 0; m < cls->method_count; m++) {
if (strcmp(cls->methods[m].method_name, "init") == 0) {
init = &cls->methods[m];
break;
}
}
if (init != NULL) {
// insertar instancia bajo de los args para self
int nArgs = instr.operand.call.arg_count;
for (int a = vm->sp - 1; a >= vm->sp - nArgs; a--) {
vm->stack[a + 1] = vm->stack[a];
}
vm->stack[vm->sp - nArgs] = instVal;
vm->sp++;
// Save frame
CallFrame *frame = &vm->frames[vm->frame_count++];
frame->return_ip = vm->ip + 1;
frame->saved_sp = vm->sp - nArgs - 1;
frame->is_constructor = 1;
frame->constructor_instance = instVal;
memcpy(frame->saved_vars, vm->vars, sizeof(vm->vars));
;
memcpy(frame->saved_var_set, vm->var_set, sizeof(vm->var_set));
vm->ip = init->entry_point;
continue;
} else {
// No hay init, descartar args y devolver instancia
vm->sp -= instr.operand.call.arg_count;
vm->stack[vm->sp++] = instVal;
}
break;
}
if (strcmp(name, "print") == 0 || strcmp(name, "println") == 0) {
int nParams = instr.operand.call.arg_count;
for (int i = 0; i < nParams; i++) {
Value v = vm->stack[vm->sp - nParams + i];
switch (v.type) {
case VAL_INT:
printf("%d", v.as.int_val);
break;
case VAL_OBJ: {
// Get object from heap
obj_print(vm->allocator, v.as.heap_offset, "", "");
break;
}
default:
break;
}
}
vm->sp -= nParams;
if (strcmp(name, "println") == 0) {
printf("\n");
}
} else if (strcmp(name, "debugHeap") == 0) {
printf("\n");
JLANG_visualize(vm->allocator);
break;
} else if (strcmp(name, "len") == 0) {
// pop value from stack
Value var1 = vm->stack[--vm->sp];
if (var1.type == VAL_OBJ) {
// Resolve obj
Object *obj = JLANG_RESOLVE(vm->allocator, var1.as.heap_offset);
Value result = {0};
result.type = VAL_INT;
result.as.int_val = obj->data.string_val.length;
// push to stack
vm->stack[vm->sp++] = result;
}
break;
} else {
printf("error: function '%s' not found!\n", name);
return;
}
break;
}
case OP_RETURN: {
// Captrurar valor de retorno si hay alguno en el stack
Value return_val = {0};
int has_return = 0;
if (vm->sp > vm->frames[vm->frame_count - 1].saved_sp) {
return_val = vm->stack[--vm->sp];
has_return = 1;
}
// Restaurar call frame
CallFrame *frame = &vm->frames[--vm->frame_count];
vm->ip = frame->return_ip;
vm->sp = frame->saved_sp;
memcpy(vm->vars, frame->saved_vars, sizeof(vm->vars));
memcpy(vm->var_set, frame->saved_var_set, sizeof(vm->var_set));
// Push return value
if (frame->is_constructor) {
vm->stack[vm->sp++] = frame->constructor_instance;
} else if (has_return) {
vm->stack[vm->sp++] = return_val;
} else {
Value nil = {0};
vm->stack[vm->sp++] = nil;
}
continue;
}
case OP_ADD: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
if (var1.type != var2.type) {
printf("panic: var types mismatch on OP_ADD\n");
return;
}
Value result = {0};
if (var1.type == VAL_INT) {
result.type = VAL_INT;
result.as.int_val = var1.as.int_val + var2.as.int_val;
} else if (var1.type == VAL_OBJ) {
// resolve obj
Object *obj1 = JLANG_RESOLVE(vm->allocator, var1.as.heap_offset);
Object *obj2 = JLANG_RESOLVE(vm->allocator, var2.as.heap_offset);
// get chars
char *str1 = JLANG_RESOLVE(vm->allocator, obj1->data.string_val.chars);
char *str2 = JLANG_RESOLVE(vm->allocator, obj2->data.string_val.chars);
// tmp char buffer
size_t total =
obj1->data.string_val.length + obj2->data.string_val.length;
char *tmpBuffer = (char *)malloc(total + 1);
memcpy(tmpBuffer, str1, obj1->data.string_val.length);
memcpy(tmpBuffer + obj1->data.string_val.length, str2,
obj2->data.string_val.length);
tmpBuffer[total] = '\0';
// Create new str
size_t strHeapIndex = obj_new_string(vm->allocator, tmpBuffer);
free(tmpBuffer);
// set value
result.type = VAL_OBJ;
result.as.heap_offset = strHeapIndex;
}
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_SUB: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
Value result = {0};
result.type = VAL_INT;
result.as.int_val = var1.as.int_val - var2.as.int_val;
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_MUL: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
Value result = {0};
result.type = VAL_INT;
result.as.int_val = var1.as.int_val * var2.as.int_val;
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_DIV: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
Value result = {0};
result.type = VAL_INT;
result.as.int_val = var1.as.int_val / var2.as.int_val;
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_CMP_GT: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
Value result = {0};
result.type = VAL_INT;
result.as.int_val = var1.as.int_val > var2.as.int_val;
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_CMP_LT: {
// Pop from stack
Value var2 = vm->stack[--vm->sp];
Value var1 = vm->stack[--vm->sp];
Value result = {0};
result.type = VAL_INT;
result.as.int_val = var1.as.int_val < var2.as.int_val;
// Push to stack
vm->stack[vm->sp++] = result;
break;
}
case OP_NOP: {
// Pass gc
size_t roots[512];
int root_count = 0;
for (int i = 0; i < 256; i++) {
if (vm->var_set[i] && vm->vars[i].type == VAL_OBJ) {
roots[root_count++] = vm->vars[i].as.heap_offset;
// si es instancia agregar fields
Object *obj =
JLANG_RESOLVE(vm->allocator, vm->vars[i].as.heap_offset);
if (obj->type == OBJ_INSTANCE) {
Value *values = (Value *)JLANG_RESOLVE(
vm->allocator, obj->data.instance_val.field_values);
for (int f = 0; f < obj->data.instance_val.field_count; f++) {
if (values[f].type == VAL_OBJ) {
roots[root_count++] = values[f].as.heap_offset;
}
}
}
}
}
for (int i = 0; i < vm->sp; i++) {
if (vm->stack[i].type == VAL_OBJ) {
roots[root_count++] = vm->stack[i].as.heap_offset;
}
}
gc_collect(vm->allocator, roots, root_count);
break;
}
case OP_GET_FIELD: {
Value instance = vm->stack[--vm->sp];
Object *obj = JLANG_RESOLVE(vm->allocator, instance.as.heap_offset);
int name_idx = instr.operand.var_index;
int *names = (int *)JLANG_RESOLVE(vm->allocator,
obj->data.instance_val.field_names);
Value *values = (Value *)JLANG_RESOLVE(
vm->allocator, obj->data.instance_val.field_values);
int found = 0;
for (int i = 0; i < obj->data.instance_val.field_count; i++) {
if (names[i] == name_idx) {
vm->stack[vm->sp++] = values[i];
found = 1;
break;
}
}
if (!found) {
printf("error: field '%s' not found\n", vm->chunk->names[name_idx]);
return;
}
break;
}
case OP_SET_FIELD: {
Value instance = vm->stack[--vm->sp];
Value value = vm->stack[--vm->sp];
int name_idx = instr.operand.var_index;
Object *obj = JLANG_RESOLVE(vm->allocator, instance.as.heap_offset);
int *names = (int *)JLANG_RESOLVE(vm->allocator,
obj->data.instance_val.field_names);
Value *values = (Value *)JLANG_RESOLVE(
vm->allocator, obj->data.instance_val.field_values);
int found = 0;
for (int i = 0; i < obj->data.instance_val.field_count; i++) {
if (names[i] == name_idx) {
values[i] = value;
found = 1;
break;
}
}
if (!found) {
// Agregar campo nuevo
int idx = obj->data.instance_val.field_count++;
names[idx] = name_idx;
values[idx] = value;
}
break;
}
case OP_CALL_METHOD: {
int method_name_idx = instr.operand.call.name_index;
int arg_count = instr.operand.call.arg_count;
char *method_name = vm->chunk->names[method_name_idx];
// la instancia está bajo de los args
Value instance = vm->stack[vm->sp - arg_count - 1];
Object *obj = JLANG_RESOLVE(vm->allocator, instance.as.heap_offset);
ClassEntry *cls = &vm->chunk->classes[obj->data.instance_val.class_index];
// Buscar metodo
MethodEntry *method = NULL;
for (int i = 0; i < cls->method_count; i++) {
if (strcmp(cls->methods[i].method_name, method_name) == 0) {
method = &cls->methods[i];
break;
}
}
if (method == NULL) {
printf("error: method '%s' not found in class '%s'\n", method_name,
cls->name);
return;
}
// Save frame
CallFrame *frame = &vm->frames[vm->frame_count++];
frame->return_ip = vm->ip + 1;
frame->saved_sp = vm->sp - arg_count - 1;
frame->is_constructor = 0;
memcpy(frame->saved_vars, vm->vars, sizeof(vm->vars));
memcpy(frame->saved_var_set, vm->var_set, sizeof(vm->var_set));
vm->ip = method->entry_point;
continue;
}
default:
break;
}
// go to next instruction
vm->ip++;
}
}
#endif

View File

@@ -1,8 +1,8 @@
#ifndef JLANG_EVAL_H #ifndef JLANG_EVAL_H
#define JLANG_EVAL_H #define JLANG_EVAL_H
#include "../frontend/parser.h" #include "../../frontend/parser.h"
#include "../memory/gc.h" #include "../../memory/gc.h"
typedef struct { typedef struct {
char *name; char *name;
@@ -41,16 +41,8 @@ void env_set(Environment *env, const char *name, size_t value) {
int step = 0; int step = 0;
size_t eval(ASTNode *node, Environment *env, void *allocator, int debug, int gc) { size_t eval(ASTNode *node, Environment *env, void *allocator, int debug,
int gc) {
// Run GC
if (gc) {
size_t roots[256];
for (int i = 0; i < env->count; i++) {
roots[i] = env->vars[i].value;
}
gc_collect(allocator, roots, env->count);
}
if (debug > 0) { if (debug > 0) {
step++; step++;
@@ -62,6 +54,8 @@ size_t eval(ASTNode *node, Environment *env, void *allocator, int debug, int gc)
} }
switch (node->type) { switch (node->type) {
case NODE_STRING_LIT:
return obj_new_string(allocator, node->data.string_val);
case NODE_INT_LIT: case NODE_INT_LIT:
return obj_new_int(allocator, node->data.int_val); return obj_new_int(allocator, node->data.int_val);
case NODE_VAR: case NODE_VAR:
@@ -81,6 +75,23 @@ size_t eval(ASTNode *node, Environment *env, void *allocator, int debug, int gc)
// Operar (ints por ahora) // Operar (ints por ahora)
if (node->data.binop.op == '+') { if (node->data.binop.op == '+') {
if (l->type == OBJ_STRING) {
int n = l->data.string_val.length + r->data.string_val.length;
char *tempBuff = (char *)malloc(n + 1);
// Copy left text
memcpy(tempBuff, JLANG_RESOLVE(allocator, l->data.string_val.chars),
l->data.string_val.length);
// Copy right text
memcpy(tempBuff + l->data.string_val.length,
JLANG_RESOLVE(allocator, r->data.string_val.chars),
r->data.string_val.length);
tempBuff[n] = '\0';
size_t newObj = obj_new_string(allocator, tempBuff);
free(tempBuff);
return newObj;
}
return obj_new_int(allocator, l->data.int_val + r->data.int_val); return obj_new_int(allocator, l->data.int_val + r->data.int_val);
} else if (node->data.binop.op == '-') { } else if (node->data.binop.op == '-') {
return obj_new_int(allocator, l->data.int_val - r->data.int_val); return obj_new_int(allocator, l->data.int_val - r->data.int_val);
@@ -94,13 +105,16 @@ size_t eval(ASTNode *node, Environment *env, void *allocator, int debug, int gc)
return obj_new_int(allocator, l->data.int_val > r->data.int_val); return obj_new_int(allocator, l->data.int_val > r->data.int_val);
} }
} }
case NODE_PRINT: {
size_t val = eval(node->data.print.expr, env, allocator, debug, gc);
obj_print(allocator, val, "");
printf("\n");
return val;
}
case NODE_BLOCK: case NODE_BLOCK:
// Run GC
if (gc) {
size_t roots[256];
for (int i = 0; i < env->count; i++) {
roots[i] = env->vars[i].value;
}
gc_collect(allocator, roots, env->count);
}
for (int i = 0; i < node->data.block.count; i++) for (int i = 0; i < node->data.block.count; i++)
eval(node->data.block.stmts[i], env, allocator, debug, gc); eval(node->data.block.stmts[i], env, allocator, debug, gc);
return 0; return 0;
@@ -112,6 +126,51 @@ size_t eval(ASTNode *node, Environment *env, void *allocator, int debug, int gc)
break; break;
eval(node->data.while_loop.body, env, allocator, debug, gc); eval(node->data.while_loop.body, env, allocator, debug, gc);
} }
return 0;
case NODE_IF: {
size_t cond = eval(node->data.while_loop.cond, env, allocator, debug, gc);
Object *obj = (Object *)JLANG_RESOLVE(allocator, cond);
if (obj->data.int_val > 0) {
eval(node->data.while_loop.body, env, allocator, debug, gc);
}
break;
}
case NODE_CALL: {
if (strcmp(node->data.call.name, "print") == 0) {
if (node->data.call.arg_count > 0) {
size_t val = eval(node->data.call.args[0], env, allocator, debug, gc);
obj_print(allocator, val, "", "");
return val;
}
printf("");
return 0;
}
if (strcmp(node->data.call.name, "println") == 0) {
if (node->data.call.arg_count > 0) {
size_t val = eval(node->data.call.args[0], env, allocator, debug, gc);
obj_print(allocator, val, "", "");
printf("\n");
return val;
}
printf("\n");
return 0;
}
if (strcmp(node->data.call.name, "len") == 0) {
if (node->data.call.arg_count == 1) {
size_t val = eval(node->data.call.args[0], env, allocator, debug, gc);
Object *obj = (Object *) JLANG_RESOLVE(allocator, val);
if (obj->type == OBJ_STRING) {
return obj_new_int(allocator, obj->data.string_val.length);
}
}
return 0;
}
printf("ERROR: funcion '%s' no definida\n", node->data.call.name);
exit(1);
}
default: default:
break; break;
} }

504
src/backend/mycpu/gencode.h Normal file
View File

@@ -0,0 +1,504 @@
#ifndef JLANG_MYCPU_H
#define JLANG_MYCPU_H
#include "../../frontend/parser.h"
#include "opcodes.h"
#define CPU_NOT_FOUND 0xFFFF
typedef struct {
char *name;
short entry_point;
int param_count;
char **param_names;
} CPUFunctionEntry;
typedef struct {
CPUInstruction code[4096];
int code_count;
char *names[256];
unsigned short name_addr[256];
int name_count;
CPUFunctionEntry functions[64];
int func_count;
unsigned short ram_offset;
char *current_fn; // NULL si estamos en global
char **comments[4096];
int comments_per_code[4096];
} CPUChunk;
unsigned short cpu_malloc(CPUChunk *chunk, size_t size) {
// Save current offset
short current_addr = chunk->ram_offset;
// Increase offset
chunk->ram_offset += size;
return current_addr;
}
short cpu_emit(CPUChunk *chunk, CPUInstruction instr) {
chunk->code[chunk->code_count++] = instr;
return chunk->code_count - 1;
}
CPUInstruction cpu_make_instruction(CPUOpCode op) {
CPUInstruction instr = {0};
instr.op = op;
return instr;
}
unsigned short cpu_find_name(CPUChunk *chunk, char *name) {
for (int i = 0; i < chunk->name_count; i++) {
if (strcmp(chunk->names[i], name) == 0) {
return chunk->name_addr[i];
}
}
return CPU_NOT_FOUND;
}
unsigned short cpu_add_name(CPUChunk *chunk, char *name) {
for (int i = 0; i < chunk->name_count; i++) {
if (strcmp(chunk->names[i], name) == 0) {
return chunk->name_addr[i];
}
}
chunk->names[chunk->name_count++] = name;
// Asignar hueco en la ram
chunk->name_addr[chunk->name_count - 1] = cpu_malloc(chunk, 1);
return chunk->name_addr[chunk->name_count - 1];
}
void cpu_comment(CPUChunk *chunk, char *comment) {
unsigned short pc = chunk->code_count;
// Get comment count for pc
int total_comments = chunk->comments_per_code[pc];
chunk->comments_per_code[pc] += 1;
if (total_comments == 0) {
// Allocate space for comments
chunk->comments[pc] = malloc(sizeof(char *) * 64);
}
chunk->comments[pc][total_comments] = comment;
}
short cpu_compile_node(CPUChunk *chunk, ASTNode *node, int depth) {
if (depth > 12) {
printf("error: max register depth reached\n");
exit(1);
}
switch (node->type) {
case NODE_INT_LIT: {
// Mostrar comentario
char comment[64];
snprintf(comment, sizeof(comment), "NODE_INT_LIT %d", node->data.int_val);
cpu_comment(chunk, strdup(comment));
// Cargar inmediato en REG[depth]
CPUInstruction instr = cpu_make_instruction(II_ADD);
instr.param1 = node->data.int_val;
instr.param2 = 0;
instr.target = depth; // REG[depth]
cpu_emit(chunk, instr);
break;
}
case NODE_VAR: {
// Leer de ram a REG[depth]
// Obtener addr de la variable de la tabla o crearla
unsigned short var_addr;
if (chunk->current_fn) {
char mangled[128];
snprintf(mangled, sizeof(mangled), "%s.%s", chunk->current_fn,
node->data.string_val);
// Primero buscar variable local de la funcion
var_addr = cpu_find_name(chunk, strdup(mangled));
if (var_addr == CPU_NOT_FOUND) {
// Si no existe buscamos la variable global
var_addr = cpu_add_name(chunk, node->data.string_val);
}
} else {
var_addr = cpu_add_name(chunk, node->data.string_val);
}
// Add comment
cpu_comment(chunk, "REG13 = var_addr");
// REG13 = var_addr
CPUInstruction set_addr = cpu_make_instruction(II_ADD);
set_addr.param1 = var_addr;
set_addr.param2 = 0;
set_addr.target = 0x0D;
cpu_emit(chunk, set_addr);
// REG12 -> REG[DEPTH]
CPUInstruction mov_val = cpu_make_instruction(RI_ADD);
mov_val.param1 = 0x0C;
mov_val.param2 = 0;
mov_val.target = depth;
cpu_emit(chunk, mov_val);
break;
}
case NODE_ASSIGN: {
char comment[64];
snprintf(comment, sizeof(comment), "ASSIGN %s", node->data.assign.name);
cpu_comment(chunk, strdup(comment));
// 1. Compilar la expresion, el resultado va a REG[depth]
cpu_compile_node(chunk, node->data.assign.value, depth);
// 2. Obtener direccion ram de la variable
unsigned short addr;
if (chunk->current_fn) {
char mangled[128];
snprintf(mangled, sizeof(mangled), "%s.%s", chunk->current_fn,
node->data.assign.name);
addr = cpu_find_name(chunk, strdup(mangled));
if (addr == CPU_NOT_FOUND) {
addr = cpu_add_name(chunk, node->data.assign.name);
}
} else {
addr = cpu_add_name(chunk, node->data.assign.name);
}
// 3. REG13 = addr
CPUInstruction set_addr = cpu_make_instruction(II_ADD);
set_addr.param1 = addr;
set_addr.param2 = 0;
set_addr.target = 0x0D;
cpu_emit(chunk, set_addr);
// 4. REG12 = REG[depth]
CPUInstruction set_value = cpu_make_instruction(RI_ADD);
set_value.param1 = depth; // REG[depth]
set_value.param2 = 0;
set_value.target = 0x0C;
cpu_emit(chunk, set_value);
// 5. RSTR -> RAM[REG13] = REG12 -> STORE RAM
cpu_emit(chunk, cpu_make_instruction(RR_STR));
break;
}
case NODE_CALL: {
char comment[64];
snprintf(comment, sizeof(comment), "NODE_CALL %s", node->data.call.name);
cpu_comment(chunk, strdup(comment));
// Pushear argumentos al stack
for (int i = 0; i < node->data.call.arg_count; i++) {
cpu_compile_node(chunk, node->data.call.args[i], depth);
// push reg[depth]
cpu_comment(chunk, "RI_PUSH");
CPUInstruction push_arg = cpu_make_instruction(RI_PUSH);
push_arg.param1 = depth;
push_arg.target =
0xff; // si se deja en 0, el push sobrescribe ese registro a 0
cpu_emit(chunk, push_arg);
}
// Buscar funcion en la tabla de funciones
unsigned short fn_addr = CPU_NOT_FOUND;
for (int i = 0; i < chunk->func_count; i++) {
if (strcmp(chunk->functions[i].name, node->data.call.name) == 0) {
fn_addr = chunk->functions[i].entry_point;
}
}
if (fn_addr == CPU_NOT_FOUND) {
printf("error: funcion '%s' no definida\n", node->data.call.name);
exit(1);
}
cpu_comment(chunk, "IR_CALL");
CPUInstruction call_fn = cpu_make_instruction(IR_CALL);
call_fn.param1 = fn_addr * 4;
cpu_emit(chunk, call_fn);
if (depth != 0) {
CPUInstruction mov = cpu_make_instruction(RI_ADD);
mov.param1 = 0;
mov.param2 = 0;
mov.target = depth;
cpu_emit(chunk, mov);
}
break;
}
case NODE_BLOCK: {
// Iterar statments y compilar
for (int i = 0; i < node->data.block.count; i++) {
cpu_compile_node(chunk, node->data.block.stmts[i], depth);
}
break;
}
case NODE_RETURN: {
if (node->data.ret.value) {
// Por convencion el resultado va a REG0
cpu_compile_node(chunk, node->data.ret.value, 0);
}
CPUInstruction ret = cpu_make_instruction(RR_RET);
ret.target = 0xff;
cpu_emit(chunk, ret);
break;
}
case NODE_BINOP: {
cpu_compile_node(chunk, node->data.binop.left, depth);
cpu_compile_node(chunk, node->data.binop.right, depth + 1);
CPUOpCode op;
switch (node->data.binop.op) {
case '+': {
op = RR_ADD;
break;
}
case '-': {
op = RR_SUB;
break;
}
case '*': {
op = RR_MUL;
break;
}
case '/': {
op = RR_DIV;
break;
}
default: {
op = RR_ADD;
break;
}
}
CPUInstruction binop = cpu_make_instruction(op);
binop.param1 = depth;
binop.param2 = depth + 1;
binop.target = depth;
cpu_emit(chunk, binop);
break;
}
case NODE_FN_DEF: {
char comment[64];
snprintf(comment, sizeof(comment), "NODE_FN_DEF %s", node->data.fn_def.name);
cpu_comment(chunk, strdup(comment));
// emitir jmp para ignorar la funcion por defecto
CPUInstruction jump = cpu_make_instruction(II_ADD);
jump.param1 = 0x00;
jump.param2 = 0x00;
jump.target = 0x0E;
short jump_idx = cpu_emit(chunk, jump);
// Registrar entrypoint
short entry_point = chunk->code_count;
CPUFunctionEntry *fn = &chunk->functions[chunk->func_count++];
fn->name = node->data.fn_def.name;
fn->entry_point = entry_point;
fn->param_count = node->data.fn_def.param_count;
fn->param_names = node->data.fn_def.params;
CPUInstruction pop_ret = cpu_make_instruction(RR_POP);
pop_ret.target = 0x0B; // REG11 como temp
cpu_emit(chunk, pop_ret);
// Pop de los params del stack a RAM
for (int i = node->data.fn_def.param_count - 1; i >= 0; i--) {
// POP -> reg0
CPUInstruction pop = cpu_make_instruction(RR_POP);
pop.target = 0x00;
cpu_emit(chunk, pop);
// guardar en ram como variable local
char *param_name = node->data.fn_def.params[i];
char mangled[128];
snprintf(mangled, sizeof(mangled), "%s.%s", node->data.fn_def.name,
param_name);
short addr = cpu_add_name(chunk, strdup(mangled));
CPUInstruction set_addr = cpu_make_instruction(II_ADD);
set_addr.param1 = addr;
set_addr.target = 0x0D;
cpu_emit(chunk, set_addr);
CPUInstruction set_val = cpu_make_instruction(RI_ADD);
set_val.param1 = 0x00;
set_val.target = 0x0C;
cpu_emit(chunk, set_val);
cpu_emit(chunk, cpu_make_instruction(RR_STR));
}
CPUInstruction push_ret = cpu_make_instruction(RR_PUSH);
push_ret.param1 = 0x0B;
cpu_emit(chunk, push_ret);
chunk->current_fn = node->data.fn_def.name;
// Compilar body
cpu_compile_node(chunk, node->data.fn_def.body, 0);
chunk->current_fn = NULL;
// RET implicito
cpu_emit(chunk, cpu_make_instruction(RR_RET));
// Backpatch del jump
chunk->code[jump_idx].param1 = chunk->code_count * 4;
break;
}
case NODE_IF: {
cpu_comment(chunk, "NODE_IF");
ASTNode *cond = node->data.if_statement.cond;
// Compilar miembros de la expresion y mover a depth y depth+1
cpu_compile_node(chunk, cond->data.binop.left, depth);
cpu_compile_node(chunk, cond->data.binop.right, depth + 1);
// Opcode inverso (saltar si la conficion es falsa)
CPUOpCode skip_op;
switch (cond->data.binop.op) {
case '>': {
skip_op = RR_LSE;
break;
}
case '<': {
skip_op = RR_GRE;
break;
}
case 'e': {
skip_op = RR_NEQ;
break;
}
case 'n': {
skip_op = RR_EQ;
break;
}
default: {
printf("error: binop op not supported '%c'\n", cond->data.binop.op);
exit(1);
}
}
// Emitir salto con placeholder
CPUInstruction jmp = cpu_make_instruction(skip_op);
jmp.param1 = depth;
jmp.param2 = depth + 1;
jmp.target = 0;
short jmp_idx = cpu_emit(chunk, jmp);
// Compilar body
cpu_compile_node(chunk, node->data.if_statement.body, depth);
// Backpatch
chunk->code[jmp_idx].target = chunk->code_count * 4;
break;
}
case NODE_WHILE: {
cpu_comment(chunk, "NODE_WHILE");
// Guardar entrada al bucle
short entry = chunk->code_count * 4;
// chequear condicional, si no se cumple, salir del bucle
ASTNode *cond = node->data.while_loop.cond;
// Compilar miembros de la expresion y mover a depth y depth+1
cpu_compile_node(chunk, cond->data.binop.left, depth);
cpu_compile_node(chunk, cond->data.binop.right, depth + 1);
// Opcode inverso (saltar si la conficion es falsa)
CPUOpCode skip_op;
switch (cond->data.binop.op) {
case '>': {
skip_op = RR_LSE;
break;
}
case '<': {
skip_op = RR_GRE;
break;
}
case 'e': {
skip_op = RR_NEQ;
break;
}
case 'n': {
skip_op = RR_EQ;
break;
}
default: {
printf("error: binop op not supported '%c'\n", cond->data.binop.op);
exit(1);
}
}
// Emitir salto con placeholder
CPUInstruction jmp = cpu_make_instruction(skip_op);
jmp.param1 = depth;
jmp.param2 = depth + 1;
jmp.target = 0;
short jmp_idx = cpu_emit(chunk, jmp);
// Compilar body
cpu_compile_node(chunk, node->data.while_loop.body, depth);
// Saltar al principio
CPUInstruction entry_jmp = cpu_make_instruction(II_ADD);
entry_jmp.param1 = entry;
entry_jmp.target = 0x0E;
cpu_emit(chunk, entry_jmp);
// Backpatch
chunk->code[jmp_idx].target = chunk->code_count * 4;
break;
}
default:
break;
}
return -1;
}
CPUChunk *compileAST(ASTNode *root) {
CPUChunk *chunk = (CPUChunk *)malloc(sizeof(CPUChunk));
memset(chunk, 0, sizeof(CPUChunk));
cpu_compile_node(chunk, root, 0);
// Añadir HALT
cpu_comment(chunk, "HALT");
CPUInstruction halt = cpu_make_instruction(RR_HALT);
cpu_emit(chunk, halt);
return chunk;
}
void cpu_print_code(CPUChunk *chunk, int show_comments) {
printf("========= BINARY =========\n");
for (int i = 0; i < chunk->code_count; i++) {
if (show_comments) {
// Mostrar comentarios
for (int c = 0; c < chunk->comments_per_code[i]; c++) {
printf("# %s\n", chunk->comments[i][c]);
}
}
CPUInstruction instr = chunk->code[i];
printf("0x%04X 0x%04X 0x%04X 0x%04X\n", instr.op, instr.param1,
instr.param2, instr.target);
}
}
#endif

129
src/backend/mycpu/opcodes.h Normal file
View File

@@ -0,0 +1,129 @@
#ifndef JLANG_MYCPU_OOCODES_H
#define JLANG_MYCPU_OOCODES_H
typedef enum {
// Register, Register
RR_ADD,
RR_SUB,
RR_MUL,
RR_DIV,
RR_AND,
RR_OR,
RR_NOT,
RR_NAND,
RR_NOR,
RR_XOR,
RR_XNOR,
RR_NEG,
RR_EQ = 0x10,
RR_NEQ,
RR_LS,
RR_LSE,
RR_GR,
RR_GRE,
RR_STR = 0x18,
RR_PUSH,
RR_POP,
RR_CALL = 0x20,
RR_RET,
RR_HALT,
// Inmediate, Register
IR_ADD = 0x40,
IR_SUB,
IR_MUL,
IR_DIV,
IR_AND,
IR_OR,
IR_NOT,
IR_NAND,
IR_NOR,
IR_XOR,
IR_XNOR,
IR_NEG,
IR_EQ = 0x50,
IR_NEQ,
IR_LS,
IR_LSE,
IR_GR,
IR_GRE,
IR_STR = 0x58,
IR_PUSH,
IR_POP,
IR_CALL = 0x60,
IR_RET,
IR_HALT,
// Register, Inmediate
RI_ADD = 0x80,
RI_SUB,
RI_MUL,
RI_DIV,
RI_AND,
RI_OR,
RI_NOT,
RI_NAND,
RI_NOR,
RI_XOR,
RI_XNOR,
RI_NEG,
RI_EQ = 0x90,
RI_NEQ,
RI_LS,
RI_LSE,
RI_GR,
RI_GRE,
RI_STR = 0x98,
RI_PUSH,
RI_POP,
RI_CALL = 0xA0,
RI_RET,
RI_HALT,
// Inmediate, Inmediate
II_ADD = 0xC0,
II_SUB,
II_MUL,
II_DIV,
II_AND,
II_OR,
II_NOT,
II_NAND,
II_NOR,
II_XOR,
II_XNOR,
II_NEG,
II_EQ = 0xD0,
II_NEQ,
II_LS,
II_LSE,
II_GR,
II_GRE,
II_STR = 0xD8,
II_PUSH,
II_POP,
II_CALL = 0xE0,
II_RET,
II_HALT,
} CPUOpCode;
typedef struct {
CPUOpCode op;
unsigned short param1;
unsigned short param2;
unsigned short target;
} CPUInstruction;
#endif

View File

@@ -15,9 +15,11 @@ typedef enum {
// Identificadores y keywords // Identificadores y keywords
TOK_ID, // x, foo, mi_var TOK_ID, // x, foo, mi_var
TOK_PRINT, // print
TOK_IF, // if TOK_IF, // if
TOK_WHILE, // while TOK_WHILE, // while
TOK_FN, // fn
TOK_RETURN, // return
TOK_CLASS, // class
// Operadores // Operadores
TOK_ASSIGN, // = TOK_ASSIGN, // =
@@ -34,6 +36,8 @@ typedef enum {
TOK_LPAREN, // ( TOK_LPAREN, // (
TOK_RPAREN, // ) TOK_RPAREN, // )
TOK_COLON, // : TOK_COLON, // :
TOK_COMMA, // ,
TOK_DOT, // .
TOK_NEWLINE, // \n (significativo, como en Python) TOK_NEWLINE, // \n (significativo, como en Python)
TOK_INDENT, // aumento de indentacion TOK_INDENT, // aumento de indentacion
TOK_DEDENT, // reduccion de indentacion TOK_DEDENT, // reduccion de indentacion
@@ -87,12 +91,17 @@ Token *tokenize(const char *source, int *token_count) {
pos++; pos++;
} }
int new_level = spaces / 4; // 4 espacios = 1 nivel int new_level = spaces / 4; // 4 espacios = 1 nivel
if (source[pos] != '\n' && source[pos] != '\0') {
if (new_level > indent_level) { if (new_level > indent_level) {
for (int l = indent_level; l < new_level; l++)
tokens[count++] = make_token(TOK_INDENT, "INDENT"); tokens[count++] = make_token(TOK_INDENT, "INDENT");
} else if (new_level < indent_level) { } else if (new_level < indent_level) {
for (int l = indent_level; l > new_level; l--)
tokens[count++] = make_token(TOK_DEDENT, "DEDENT"); tokens[count++] = make_token(TOK_DEDENT, "DEDENT");
} }
indent_level = new_level; indent_level = new_level;
}
} else if (c == '+') { } else if (c == '+') {
tokens[count++] = make_token(TOK_PLUS, "+"); tokens[count++] = make_token(TOK_PLUS, "+");
pos++; pos++;
@@ -106,8 +115,21 @@ Token *tokenize(const char *source, int *token_count) {
tokens[count++] = make_token(TOK_SLASH, "/"); tokens[count++] = make_token(TOK_SLASH, "/");
pos++; pos++;
} else if (c == '=') { } else if (c == '=') {
if (source[pos + 1] == '=') {
tokens[count++] = make_token(TOK_EQ, "==");
pos += 2;
} else {
tokens[count++] = make_token(TOK_ASSIGN, "="); tokens[count++] = make_token(TOK_ASSIGN, "=");
pos++; pos++;
}
} else if (c == '!') {
if (source[pos + 1] == '=') {
tokens[count++] = make_token(TOK_NEQ, "!=");
pos += 2;
} else {
printf("WARN: caracter no reconocido '!' en pos %d\n", pos);
exit(1);
}
} else if (c == '<') { } else if (c == '<') {
tokens[count++] = make_token(TOK_LT, "<"); tokens[count++] = make_token(TOK_LT, "<");
pos++; pos++;
@@ -117,24 +139,51 @@ Token *tokenize(const char *source, int *token_count) {
} else if (c == ':') { } else if (c == ':') {
tokens[count++] = make_token(TOK_COLON, ":"); tokens[count++] = make_token(TOK_COLON, ":");
pos++; pos++;
} else if (c == ',') {
tokens[count++] = make_token(TOK_COMMA, ",");
pos++;
} else if (c == '.') {
tokens[count++] = make_token(TOK_DOT, ".");
pos++;
} else if (c == '(') {
tokens[count++] = make_token(TOK_LPAREN, "(");
pos++;
} else if (c == ')') {
tokens[count++] = make_token(TOK_RPAREN, ")");
pos++;
} else if (c == '"') {
// Leer todo hasta el proximo '"'
pos++; // consumir '"'
int start = pos;
while (source[pos] != '"')
pos++;
tokens[count++] = make_token(TOK_STRING, substr(source, start, pos));
pos++; // consumir '"'
} else if (c >= '0' && c <= '9') { } else if (c >= '0' && c <= '9') {
// Leer todos los digitos consecutivos // Leer todos los digitos consecutivos
int start = pos; int start = pos;
while (source[pos] >= '0' && source[pos] <= '9') while (source[pos] >= '0' && source[pos] <= '9')
pos++; pos++;
tokens[count++] = make_token(TOK_INT, substr(source, start, pos)); tokens[count++] = make_token(TOK_INT, substr(source, start, pos));
} else if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) { } else if ((c >= 'a' && c <= 'z') || (c == '_') || (c >= 'A' && c <= 'Z')) {
// Leer todos los caracteres consecutivos // Leer todos los caracteres consecutivos
int start = pos; int start = pos;
while (isalnum(source[pos])) while (isalnum(source[pos]) || source[pos] == '_')
pos++; pos++;
char *word = substr(source, start, pos); char *word = substr(source, start, pos);
// tokens[count++] = make_token(TOK_ID, word);
// Comprobar si es una keyword reservada // Comprobar si es una keyword reservada
if (strcmp(word, "print") == 0) { if (strcmp(word, "if") == 0) {
tokens[count++] = make_token(TOK_PRINT, word); tokens[count++] = make_token(TOK_IF, word);
} else if (strcmp(word, "while") == 0) { } else if (strcmp(word, "while") == 0) {
tokens[count++] = make_token(TOK_WHILE, word); tokens[count++] = make_token(TOK_WHILE, word);
} else if (strcmp(word, "fn") == 0) {
tokens[count++] = make_token(TOK_FN, word);
} else if (strcmp(word, "return") == 0) {
tokens[count++] = make_token(TOK_RETURN, word);
} else if (strcmp(word, "class") == 0) {
tokens[count++] = make_token(TOK_CLASS, word);
} else { } else {
tokens[count++] = make_token(TOK_ID, word); tokens[count++] = make_token(TOK_ID, word);
} }

View File

@@ -12,10 +12,17 @@ typedef enum {
NODE_VAR, // referencia a variable NODE_VAR, // referencia a variable
NODE_ASSIGN, // asignacion: x = expr NODE_ASSIGN, // asignacion: x = expr
NODE_BINOP, // operacion binaria: a + b NODE_BINOP, // operacion binaria: a + b
NODE_PRINT, // print(expr) NODE_NOP, // noop
NODE_IF, // if cond: bloque NODE_IF, // if cond: bloque
NODE_WHILE, // while cond: bloque NODE_WHILE, // while cond: bloque
NODE_BLOCK, // secuencia de statements NODE_BLOCK, // secuencia de statements
NODE_CALL,
NODE_FN_DEF, // definicion de funcion
NODE_RETURN, // return expr
NODE_CLASS_DEF, // definicion de clase
NODE_DOT_ACCESS,
NODE_DOT_ASSIGN,
NODE_METHOD_CALL,
} NodeType; } NodeType;
typedef struct ASTNode { typedef struct ASTNode {
@@ -43,6 +50,44 @@ typedef struct ASTNode {
struct ASTNode *cond; struct ASTNode *cond;
struct ASTNode *body; struct ASTNode *body;
} while_loop; // NODE_WHILE } while_loop; // NODE_WHILE
struct {
struct ASTNode *cond;
struct ASTNode *body;
} if_statement; // NODE_IF
struct {
char *name;
struct ASTNode **args;
int arg_count;
} call;
struct {
char *name;
char **params;
int param_count;
struct ASTNode *body;
} fn_def; // NODE_FN_DEF
struct {
struct ASTNode *value; // expresion de retorno
} ret; // NODE_RETURN
struct {
char *name;
struct ASTNode **methods;
int method_count;
} class_def;
struct {
struct ASTNode *object;
char *field;
} dot_access;
struct {
struct ASTNode *object;
char *field;
struct ASTNode *value;
} dot_assign;
struct {
struct ASTNode *object;
char *method;
struct ASTNode **args;
int arg_count;
} method_call;
} data; } data;
} ASTNode; } ASTNode;
@@ -54,19 +99,106 @@ ASTNode *make_node(NodeType type) {
int pos = 0; int pos = 0;
ASTNode *parse_term(Token *tokens) { ASTNode *parse_expr(Token *tokens);
ASTNode *parse_term(Token *tokens);
ASTNode *parse_factor(Token *tokens);
ASTNode *parse_factor(Token *tokens) {
if (tokens[pos].type == TOK_INT) { if (tokens[pos].type == TOK_INT) {
ASTNode *node = make_node(NODE_INT_LIT); ASTNode *node = make_node(NODE_INT_LIT);
node->data.int_val = atoi(tokens[pos].value); node->data.int_val = atoi(tokens[pos].value);
pos++; pos++;
return node; return node;
} else if (tokens[pos].type == TOK_ID) { } else if (tokens[pos].type == TOK_STRING) {
ASTNode *node = make_node(NODE_VAR); ASTNode *node = make_node(NODE_STRING_LIT);
node->data.string_val = tokens[pos].value; node->data.string_val = tokens[pos].value;
pos++; pos++;
return node; return node;
} else if (tokens[pos].type == TOK_ID) {
ASTNode *node;
if (tokens[pos + 1].type == TOK_LPAREN) {
// Function call
char *name = tokens[pos].value;
pos++; // consumir ID
pos++; // consumir (
// Parsear argumentos
ASTNode **args =
(ASTNode **)malloc(sizeof(ASTNode *) * 16); // Max 16 parametros
int arg_count = 0;
if (tokens[pos].type != TOK_RPAREN) {
args[arg_count++] = parse_expr(tokens);
while (tokens[pos].type == TOK_COMMA) {
pos++; // Consumir ","
args[arg_count++] = parse_expr(tokens);
} }
printf("ERROR: esperaba INT o ID, encontré tipo %d\n", tokens[pos].type); }
pos++; // consumir ")"
node = make_node(NODE_CALL);
node->data.call.name = name;
node->data.call.args = args;
node->data.call.arg_count = arg_count;
} else {
node = make_node(NODE_VAR);
node->data.string_val = tokens[pos].value;
pos++;
}
// Loop post-fix para dot access
while (tokens[pos].type == TOK_DOT) {
pos++; // consumir .
char *field = tokens[pos].value;
pos++; // consumir field name
if (tokens[pos].type == TOK_LPAREN) {
// obj.method(args...)
pos++; // consumir '('
ASTNode **args = (ASTNode **)malloc(sizeof(ASTNode *) * 16);
int arg_count = 0;
if (tokens[pos].type != TOK_RPAREN) {
args[arg_count++] = parse_expr(tokens);
while (tokens[pos].type == TOK_COMMA) {
pos++;
args[arg_count++] = parse_expr(tokens);
}
}
pos++; // consumir ')'
ASTNode *mc = make_node(NODE_METHOD_CALL);
mc->data.method_call.object = node;
mc->data.method_call.method = field;
mc->data.method_call.args = args;
mc->data.method_call.arg_count = arg_count;
node = mc;
} else {
// obj.field
ASTNode *da = make_node(NODE_DOT_ACCESS);
da->data.dot_access.object = node;
da->data.dot_access.field = field;
node = da;
}
}
return node;
} else if (tokens[pos].type == TOK_LPAREN) {
pos++; // consumir (
ASTNode *expr = parse_expr(tokens);
pos++; // consumir )
return expr;
} else if (tokens[pos].type == TOK_MINUS) {
pos++; // consumir '-'
ASTNode *term = parse_term(tokens);
ASTNode *neg = make_node(NODE_BINOP);
neg->data.binop.op = '-';
neg->data.binop.left = make_node(NODE_INT_LIT);
neg->data.binop.left->data.int_val = 0;
neg->data.binop.right = term;
return neg;
}
printf("ERROR: esperaba INT o ID, encontre tipo %d value: %s\n",
tokens[pos].type, tokens[pos].value);
exit(1); exit(1);
} }
@@ -74,9 +206,16 @@ ASTNode *parse_expr(Token *tokens) {
ASTNode *left = parse_term(tokens); ASTNode *left = parse_term(tokens);
while (tokens[pos].type == TOK_PLUS || tokens[pos].type == TOK_MINUS || while (tokens[pos].type == TOK_PLUS || tokens[pos].type == TOK_MINUS ||
tokens[pos].type == TOK_STAR || tokens[pos].type == TOK_SLASH || tokens[pos].type == TOK_LT || tokens[pos].type == TOK_GT ||
tokens[pos].type == TOK_LT || tokens[pos].type == TOK_GT) { tokens[pos].type == TOK_EQ || tokens[pos].type == TOK_NEQ) {
char op = tokens[pos].value[0]; // +,-,*,/ char op;
if (tokens[pos].type == TOK_EQ) {
op = 'e';
} else if (tokens[pos].type == TOK_NEQ) {
op = 'n';
} else {
op = tokens[pos].value[0]; // +,-,*,/
}
pos++; pos++;
ASTNode *right = parse_term(tokens); ASTNode *right = parse_term(tokens);
@@ -89,8 +228,69 @@ ASTNode *parse_expr(Token *tokens) {
return left; return left;
} }
ASTNode *parse_term(Token *tokens) {
ASTNode *left = parse_factor(tokens);
while (tokens[pos].type == TOK_STAR || tokens[pos].type == TOK_SLASH) {
char op = tokens[pos].value[0];
pos++;
ASTNode *right = parse_factor(tokens);
ASTNode *binop = make_node(NODE_BINOP);
binop->data.binop.op = op;
binop->data.binop.left = left;
binop->data.binop.right = right;
left = binop;
}
return left;
}
ASTNode *parse_statement(Token *tokens) { ASTNode *parse_statement(Token *tokens) {
if (tokens[pos].type == TOK_ID) { if (tokens[pos].type == TOK_ID) {
if (tokens[pos + 1].type == TOK_DOT) {
ASTNode *expr = parse_expr(tokens);
if (tokens[pos].type == TOK_ASSIGN) {
// dot_assign: self.name = expr
pos++; // consumir '='
ASTNode *value = parse_expr(tokens);
ASTNode *node = make_node(NODE_DOT_ASSIGN);
node->data.dot_assign.object = expr->data.dot_access.object;
node->data.dot_assign.field = expr->data.dot_access.field;
node->data.dot_assign.value = value;
return node;
}
// si no hay '=', es un method call como (d.speak())
return expr;
}
if (tokens[pos + 1].type == TOK_LPAREN) {
// Es una funcion
char *name = tokens[pos].value;
pos++; // consumir ID
pos++; // consumir "("
// Parsear argumentos
ASTNode **args =
(ASTNode **)malloc(sizeof(ASTNode *) * 16); // Max 16 parametros
int arg_count = 0;
if (tokens[pos].type != TOK_RPAREN) {
args[arg_count++] = parse_expr(tokens);
while (tokens[pos].type == TOK_COMMA) {
pos++; // Consumir ","
args[arg_count++] = parse_expr(tokens);
}
}
pos++; // consumir ")"
ASTNode *node = make_node(NODE_CALL);
node->data.call.name = name;
node->data.call.args = args;
node->data.call.arg_count = arg_count;
return node;
}
char *name = tokens[pos].value; char *name = tokens[pos].value;
pos++; // consumir ID pos++; // consumir ID
pos++; // consumir "=" pos++; // consumir "="
@@ -101,13 +301,21 @@ ASTNode *parse_statement(Token *tokens) {
node->data.assign.value = value; node->data.assign.value = value;
return node; return node;
} }
if (tokens[pos].type == TOK_PRINT) {
pos++; // consumir "print"
ASTNode *expr = parse_expr(tokens);
ASTNode *node = make_node(NODE_PRINT); // Parse comments
node->data.print.expr = expr; if (tokens[pos].type == TOK_SLASH) {
return node; if (tokens[pos + 1].type == TOK_SLASH) {
pos++; // consumir /
pos++; // consumir /
// Consumir hasta NewLine
while (tokens[pos].type != TOK_NEWLINE)
pos++;
pos++; // consumir newline
return make_node(NODE_NOP);
}
} }
if (tokens[pos].type == TOK_WHILE) { if (tokens[pos].type == TOK_WHILE) {
@@ -124,7 +332,7 @@ ASTNode *parse_statement(Token *tokens) {
while (tokens[pos].type != TOK_DEDENT) { while (tokens[pos].type != TOK_DEDENT) {
body->data.block.stmts[body->data.block.count++] = body->data.block.stmts[body->data.block.count++] =
parse_statement(tokens); parse_statement(tokens);
if (tokens[pos].type == TOK_NEWLINE) { while (tokens[pos].type == TOK_NEWLINE) {
pos++; pos++;
} }
} }
@@ -135,6 +343,109 @@ ASTNode *parse_statement(Token *tokens) {
node->data.while_loop.body = body; node->data.while_loop.body = body;
return node; return node;
} }
if (tokens[pos].type == TOK_IF) {
pos++; // consumir if
ASTNode *cond = parse_expr(tokens);
pos++; // consumir :
pos++; // consumir NEWLINE
pos++; // consumir INDENT
// Parsear bloque de statements hasta DEDENT
ASTNode *body = make_node(NODE_BLOCK);
body->data.block.stmts = (ASTNode **)malloc(sizeof(ASTNode *) * 256);
body->data.block.count = 0;
while (tokens[pos].type != TOK_DEDENT) {
body->data.block.stmts[body->data.block.count++] =
parse_statement(tokens);
while (tokens[pos].type == TOK_NEWLINE) {
pos++;
}
}
pos++; // Consumir DEDENT
ASTNode *node = make_node(NODE_IF);
node->data.while_loop.cond = cond;
node->data.while_loop.body = body;
return node;
}
if (tokens[pos].type == TOK_CLASS) {
pos++; // consumir 'class'
char *name = tokens[pos].value;
pos++; // consumir nombre
pos++; // consumir :
pos++; // consumir NEWLINE
pos++; // consumir INDENT
ASTNode **methods = (ASTNode **)malloc(sizeof(ASTNode *) * 16);
int method_count = 0;
while (tokens[pos].type != TOK_DEDENT) {
methods[method_count++] = parse_statement(tokens);
while (tokens[pos].type == TOK_NEWLINE) {
pos++;
}
}
pos++; // consumir DEDENT
ASTNode *node = make_node(NODE_CLASS_DEF);
node->data.class_def.name = name;
node->data.class_def.methods = methods;
node->data.class_def.method_count = method_count;
return node;
}
if (tokens[pos].type == TOK_FN) {
pos++; // consumir "fn"
char *name = tokens[pos].value;
pos++; // consumir name
pos++; // consumir "("
// Parsear parametros (max 16)
char **params = malloc(sizeof(char *) * 16);
int param_count = 0;
if (tokens[pos].type != TOK_RPAREN) {
params[param_count++] = tokens[pos].value;
pos++;
while (tokens[pos].type == TOK_COMMA) {
pos++; // consumir ","
params[param_count++] = tokens[pos].value;
pos++;
}
}
pos++; // consumir ")"
pos++; // consumir ":"
pos++; // consumir NEWLINE
pos++; // consumir INDENT
// Parsear bloque de statements hasta DEDENT
ASTNode *body = make_node(NODE_BLOCK);
body->data.block.stmts = (ASTNode **)malloc(sizeof(ASTNode *) * 256);
body->data.block.count = 0;
while (tokens[pos].type != TOK_DEDENT) {
body->data.block.stmts[body->data.block.count++] =
parse_statement(tokens);
while (tokens[pos].type == TOK_NEWLINE) {
pos++;
}
}
pos++; // Consumir DEDENT
ASTNode *node = make_node(NODE_FN_DEF);
node->data.fn_def.name = name;
node->data.fn_def.params = params;
node->data.fn_def.param_count = param_count;
node->data.fn_def.body = body;
return node;
}
if (tokens[pos].type == TOK_RETURN) {
pos++;
ASTNode *node = make_node(NODE_RETURN);
node->data.ret.value = parse_expr(tokens);
return node;
}
printf("ERROR: statement inesperado\n"); printf("ERROR: statement inesperado\n");
exit(1); exit(1);
} }
@@ -145,7 +456,7 @@ ASTNode *parse(Token *tokens, int token_count) {
block->data.block.count = 0; block->data.block.count = 0;
while (pos < token_count) { while (pos < token_count) {
if (tokens[pos].type == TOK_NEWLINE) { while (tokens[pos].type == TOK_NEWLINE) {
pos++; // Saltar newlines sueltos pos++; // Saltar newlines sueltos
continue; continue;
} }
@@ -202,11 +513,6 @@ void ast_print(ASTNode *node, const char *prefix, int is_last) {
ast_print(node->data.binop.right, new_prefix, 1); ast_print(node->data.binop.right, new_prefix, 1);
break; break;
case NODE_PRINT:
printf("NODE_PRINT\n");
ast_print(node->data.print.expr, new_prefix, 1);
break;
case NODE_BLOCK: case NODE_BLOCK:
printf("NODE_BLOCK\n"); printf("NODE_BLOCK\n");
for (int i = 0; i < node->data.block.count; i++) { for (int i = 0; i < node->data.block.count; i++) {
@@ -215,6 +521,69 @@ void ast_print(ASTNode *node, const char *prefix, int is_last) {
} }
break; break;
case NODE_IF:
printf("NODE_IF\n");
ast_print(node->data.while_loop.cond, new_prefix, 0);
ast_print(node->data.while_loop.body, new_prefix, 1);
break;
case NODE_NOP:
printf("NODE_NOOP\n");
break;
case NODE_CALL:
printf("NODE_CALL(\"%s\")\n", node->data.call.name);
for (int i = 0; i < node->data.call.arg_count; i++) {
ast_print(node->data.call.args[i], new_prefix,
i == node->data.call.arg_count - 1);
}
break;
case NODE_FN_DEF:
printf("NODE_FN_DEF(\"%s\"", node->data.fn_def.name);
for (int i = 0; i < node->data.fn_def.param_count; i++) {
printf(", %s", node->data.fn_def.params[i]);
}
printf(")\n");
ast_print(node->data.fn_def.body, new_prefix, 1);
break;
case NODE_RETURN:
printf("NODE_RETURN\n");
if (node->data.ret.value) {
ast_print(node->data.ret.value, new_prefix, 1);
}
break;
case NODE_CLASS_DEF:
printf("NODE_CLASS_DEF(\"%s\")\n", node->data.class_def.name);
for (int i = 0; i < node->data.class_def.method_count; i++) {
ast_print(node->data.class_def.methods[i], new_prefix,
i == node->data.class_def.method_count - 1);
}
break;
case NODE_DOT_ACCESS:
printf("NODE_DOT_ACCESS(.%s)\n", node->data.dot_access.field);
ast_print(node->data.dot_access.object, new_prefix, 1);
break;
case NODE_DOT_ASSIGN:
printf("NODE_DOT_ASSIGN(.%s)\n", node->data.dot_assign.field);
ast_print(node->data.dot_assign.object, new_prefix, 0);
ast_print(node->data.dot_assign.value, new_prefix, 1);
break;
case NODE_METHOD_CALL:
printf("NODE_METHOD_CALL(.%s, %d args)\n", node->data.method_call.method,
node->data.method_call.arg_count);
ast_print(node->data.method_call.object, new_prefix,
node->data.method_call.arg_count == 0);
for (int i = 0; i < node->data.method_call.arg_count; i++) {
ast_print(node->data.method_call.args[i], new_prefix,
i == node->data.method_call.arg_count - 1);
}
break;
default: default:
printf("UNKNOWN\n"); printf("UNKNOWN\n");
break; break;

View File

@@ -1,15 +1,18 @@
#include "vm/eval.h" #include "backend/eval/eval.h"
#include "backend/bytecode/compiler.h"
#include "backend/bytecode/vm.h"
#include "backend/mycpu/gencode.h"
int main(int argc, char **argv) { int main(int argc, char **argv) {
if (argc != 2) { if (argc != 3) {
printf("usage: %s <path to .j file>\n", argv[0]); printf("usage: %s eval|vm|asm|mycpu <path to .j file>\n", argv[0]);
exit(1); exit(1);
} }
// Creamos un allocator // Creamos un allocator
JLANG_memory_allocator *allocPtr = JLANG_CreateAllocator(); JLANG_memory_allocator *allocPtr = JLANG_CreateAllocator();
// Read file from argv // Read file from argv
FILE *fptr = fopen(argv[1], "r"); FILE *fptr = fopen(argv[2], "r");
if (fptr == NULL) { if (fptr == NULL) {
printf("error leyendo: %s\n", argv[1]); printf("error leyendo: %s\n", argv[1]);
exit(1); exit(1);
@@ -24,20 +27,40 @@ int main(int argc, char **argv) {
fclose(fptr); fclose(fptr);
printf("=== CODE ===\n");
printf("%s\n", buff); printf("%s\n", buff);
// Lexer test // Lexer test
int totalTokens = 0; int totalTokens = 0;
Token *tokens = tokenize(buff, &totalTokens); Token *tokens = tokenize(buff, &totalTokens);
printf("totalTokens=%d\n", totalTokens); printf("=== INFO ===\n");
ASTNode *block = parse(tokens, totalTokens); ASTNode *block = parse(tokens, totalTokens);
ast_debug(block); ast_debug(block);
if (strcmp(argv[1], "eval") == 0) {
Environment env = {0}; Environment env = {0};
eval(block, &env, allocPtr, 0, 1); eval(block, &env, allocPtr, 0, 1);
printf("heapSize=%zu\n", allocPtr->size); // printf("\nheapSize=%zu\n", allocPtr->size);
// JLANG_visualize(allocPtr); // JLANG_visualize(allocPtr);
} else if (strcmp(argv[1], "vm") == 0){
Chunk* chunk = compile(block);
VM vm = {0};
vm.chunk = chunk;
vm.allocator = allocPtr;
print_chunk(chunk);
run_vm(&vm);
// printf("\n");
// JLANG_visualize(allocPtr);
} else if (strcmp(argv[1], "mycpu") == 0){
CPUChunk* chunk = compileAST(block);
cpu_print_code(chunk, 1);
} else {
printf("panic: WIP\n");
}
return 0; return 0;
} }

View File

@@ -49,9 +49,7 @@ void *JLANG_CreateAllocator() {
allocator->size = 1 * 1024; allocator->size = 1 * 1024;
// ensure all memory is zero // ensure all memory is zero
for (int i = 0; i < 1 * 1024; i++) { memset(allocator->memory, 0, 1024);
allocator->memory[i] = 0;
}
return allocator; return allocator;
} }

View File

@@ -4,6 +4,7 @@
#include "../objects/object.h" #include "../objects/object.h"
#include "allocator.h" #include "allocator.h"
void gc_collect(JLANG_memory_allocator *allocPtr, size_t *roots, void gc_collect(JLANG_memory_allocator *allocPtr, size_t *roots,
int root_count) { int root_count) {
// Stage 1. Mark blocks // Stage 1. Mark blocks
@@ -29,6 +30,18 @@ void gc_collect(JLANG_memory_allocator *allocPtr, size_t *roots,
objPtr->data.string_val.chars - objPtr->data.string_val.chars -
sizeof(JLANG_metadata)); sizeof(JLANG_metadata));
itemsHeader->marked = 1; itemsHeader->marked = 1;
} else if (objPtr->type == OBJ_INSTANCE) {
JLANG_metadata *namesHeader =
(JLANG_metadata *)((char *)allocPtr->memory +
objPtr->data.instance_val.field_names -
sizeof(JLANG_metadata));
namesHeader->marked = 1;
JLANG_metadata *valuesHeader =
(JLANG_metadata *)((char *)allocPtr->memory +
objPtr->data.instance_val.field_values -
sizeof(JLANG_metadata));
valuesHeader->marked = 1;
} }
} }
@@ -83,13 +96,13 @@ void gc_collect(JLANG_memory_allocator *allocPtr, size_t *roots,
} }
// Create valid header // Create valid header
currentHead = (JLANG_metadata *) ((char *)allocPtr->memory + startIndex); currentHead = (JLANG_metadata *)((char *)allocPtr->memory + startIndex);
currentHead->size = (endIndex - startIndex) - sizeof(JLANG_metadata); currentHead->size = (endIndex - startIndex) - sizeof(JLANG_metadata);
} }
currentHead = (JLANG_metadata *)((char *)currentHead + currentHead->size + currentHead = (JLANG_metadata *)((char *)currentHead + currentHead->size +
sizeof(JLANG_metadata)); sizeof(JLANG_metadata));
} }
} }
#endif #endif

View File

@@ -3,11 +3,10 @@
#include "../memory/allocator.h" #include "../memory/allocator.h"
#define JLANG_RESOLVE(alloc, offset) \ #define JLANG_RESOLVE(alloc, offset) \
((void *)(((JLANG_memory_allocator *)(alloc))->memory + (offset))) ((void *)(((JLANG_memory_allocator *)(alloc))->memory + (offset)))
typedef enum { OBJ_INT, OBJ_FLOAT, OBJ_STRING, OBJ_LIST, OBJ_NONE } ObjectType; typedef enum { OBJ_INT, OBJ_FLOAT, OBJ_STRING, OBJ_LIST, OBJ_NONE, OBJ_INSTANCE } ObjectType;
typedef struct Object { typedef struct Object {
ObjectType type; ObjectType type;
@@ -23,9 +22,38 @@ typedef struct Object {
int count; int count;
int capacity; int capacity;
} list_val; } list_val;
struct {
int class_index; // indice en Chunk.classes[]
size_t field_names; // heap offset -> array de int
size_t field_values; // heap offset -> array de Value
int field_count;
int field_capacity;
} instance_val;
} data; } data;
} Object; } Object;
size_t obj_new_instance(void *allocator, int class_index, int capacity, size_t value_size) {
size_t offset = JLANG_malloc(allocator, sizeof(Object));
Object *objPtr = (Object *)JLANG_RESOLVE(allocator, offset);
objPtr->type = OBJ_INSTANCE;
objPtr->data.instance_val.class_index = class_index;
objPtr->data.instance_val.field_count = 0;
objPtr->data.instance_val.field_capacity = capacity;
// alloc array de nombres
size_t namesOffset = JLANG_malloc(allocator, capacity * sizeof(int));
objPtr = (Object *) JLANG_RESOLVE(allocator, offset); // re-resolve
// alloc array de valores
size_t valuesOffset = JLANG_malloc(allocator, capacity * value_size);
objPtr = (Object *) JLANG_RESOLVE(allocator, offset); // re-resolve
objPtr->data.instance_val.field_names = namesOffset;
objPtr->data.instance_val.field_values = valuesOffset;
return offset;
}
size_t obj_new_int(void *allocator, int value) { size_t obj_new_int(void *allocator, int value) {
// Object *objPtr = (Object *)JLANG_malloc(allocator, sizeof(Object)); // Object *objPtr = (Object *)JLANG_malloc(allocator, sizeof(Object));
size_t offset = JLANG_malloc(allocator, sizeof(Object)); size_t offset = JLANG_malloc(allocator, sizeof(Object));
@@ -91,10 +119,16 @@ void obj_free(void *allocator, size_t offset) {
JLANG_free(allocator, obj->data.list_val.items); JLANG_free(allocator, obj->data.list_val.items);
} }
if (obj->type == OBJ_INSTANCE) {
JLANG_free(allocator, obj->data.instance_val.field_names);
JLANG_free(allocator, obj->data.instance_val.field_values);
}
JLANG_free(allocator, offset); JLANG_free(allocator, offset);
} }
void obj_print(void *allocator, size_t offset, const char *preffix) { void obj_print(void *allocator, size_t offset, const char *preffix,
const char *suffix) {
Object *obj = (Object *)JLANG_RESOLVE(allocator, offset); Object *obj = (Object *)JLANG_RESOLVE(allocator, offset);
switch (obj->type) { switch (obj->type) {
@@ -117,7 +151,7 @@ void obj_print(void *allocator, size_t offset, const char *preffix) {
if (items[i] == offset) { if (items[i] == offset) {
printf("<self:0x%zu>", offset); printf("<self:0x%zu>", offset);
} else { } else {
obj_print(allocator, items[i], "\""); obj_print(allocator, items[i], "\"", "\"");
} }
if (i < obj->data.list_val.capacity - 1) { if (i < obj->data.list_val.capacity - 1) {
@@ -133,8 +167,8 @@ void obj_print(void *allocator, size_t offset, const char *preffix) {
} }
printf("%s", (char *)JLANG_RESOLVE(allocator, obj->data.string_val.chars)); printf("%s", (char *)JLANG_RESOLVE(allocator, obj->data.string_val.chars));
if (strcmp(preffix, "") != 0) { if (strcmp(suffix, "") != 0) {
printf("%s", preffix); printf("%s", suffix);
} }
break; break;
default: default: