codellama/codellama-playground · Issue with using the codellama-7b model

I have set up the codellama-7b model locally and used the official example, but the final result does not meet expectations. Here is the code:

codeLlama_tokenizer = CodeLlamaTokenizer.from_pretrained("./CodeLlama-7b-hf", padding_side='left')
codeLlama_model = LlamaForCausalLM.from_pretrained("./CodeLlama-7b-hf")
codeLlama_model.to(device='cuda:0', dtype=torch.bfloat16)

text = '''def remove_non_ascii(s: str) -> str:
        """ <FILL_ME>
        return result
    '''

start_time = time.time()
input_ids = codeLlama_tokenizer(text, return_tensors="pt")["input_ids"]
input_ids = input_ids.to('cuda')
generated_ids = codeLlama_model.generate(input_ids, max_new_tokens=200, do_sample=True, top_p=0.9, temperature=0.1, num_return_sequences=1, repetition_penalty=1.05, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
filling = codeLlama_tokenizer.batch_decode(generated_ids[:, input_ids.shape[1]:], skip_special_tokens=True)[0]
print(filling)

The output of the code is:

Remove non-ascii characters from a string. """
        result = ""
        for c in s:
            if ord(c) < 128:
                result += c
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    public String getType() {
        return type;
}

There are two issues with the generated code that don't meet expectations:
1、It doesn't consider suffixes and seems to ignore everything after <FILL_ME>.
2、After completing the desired part of the code, it adds a lot of unnecessary additional code.

Is this behavior normal? Is there any way to improve it?