Best practices for AI use in research work

Research & Art

Best practices for AI use in research work

A practical collection of best practices for working with AI tools in research.

Note: This page complements the Responsible use of Generative Artificial Intelligence in the research process guidelines with practical tips. The content is based on the many interactions with 1000+ researchers over the last three years of teaching "AI and Research Work" (Glerean, Silva) [][], the mandatory "Research Ethics for Doctoral Students" course (AI topics covered by Glerean, Solin, Rehbinder), and the CodeRefinery workshop on "Responsible Use of Generative AI in Assisted Coding" (Glerean, CodeRefinery) [][]. Do you want to expand these practical guidelines? Get in touch with Enrico Glerean and Aalto Data Agents (researchdata@aalto.fi). We are preparing a MOOC which will be openly available from September 2026.

AI and research work

Artificial Intelligence (AI) tools, for example tools based on generative AI, are used throughout the research lifecycle: from literature search, to software coding, to editing manuscripts. Please review the Responsible use of Generative Artificial Intelligence in the research process guidelines if you did not already. This page helps you identify where "AI" can help you while being aware of the risks, within the principles of research integrity.

1. AI in your research: what is AI?

AI is a vague term, in the context of research AI can play three roles:

AI as a Topic: studying AI itself (e.g., human–chatbot interaction, new machine learning (ML) algorithms). Digitalisation and AI is one of the key research areas at Aalto University.
AI as a Method: using ML or other AI methods as the analysis approach in your research (e.g., classification, prediction).
AI as a Tool: using software tools with an "AI" (often generative AI) component, to assist tasks that are not themselves about AI (e.g., proofreading, code debugging).

For the majority of researchers, even those not studying AI as a topic or using it as a method, AI is indeed a tool. This guide focuses on (generative) AI as a tool.

2. Generative AI tools across the research lifecycle

Generative AI tools can assist at every stage of the research process:

Stage	Examples of AI assistance
Planning	Literature search, grant drafting, research question brainstorming
Data collection	Survey design, data synthesis, simulations
Pre-processing	Formatting, quality checks, data cleaning
Analysis	Analysis code generation, qualitative annotations
Preservation	Documentation, README generation, metadata
Sharing	Manuscript drafting, press releases, socual media posts
Reuse	Code documentation, dataset descriptions

Each use carries risk. There is no task that is always good or always bad with AI, it depends on the context and on the user of the AI tool. As a responsible researcher, your task is to evaluate the risks before deciding whether to proceed. If you are unsure, you can start a conversation with peers (e.g. with the data agents researchdata@aalto.fi) or just avoid using AI for that specific task. The figure below gives some examples on AI tools usage across the research lifecycle.

A visualisation of the table that is in the webpage: for each step of the research lifecycle there are examples of AI use.

3. Avoid the three forms of research misconduct: how AI can enable fabrication, falsification, and plagiarism

4. Evaluating risk: the expertise × output-risk matrix

Not all AI use is equal. If you are unsure, you can consider these two factors to determine how much caution is needed:

Your expertise on the task you are delegating to AI
The importance of the output: will it be submitted for peer review, or is it of low importance (e.g. a workshop webpage)?

	Low expertise	High expertise
High risk (peer-review submission: text, code, figures, references)	Use AI only to brainstorm: ask it what sources to read, get keyword suggestions, then go read the real sources. Example: find relevant keywords or ask which statistical methods might fit your data, then go read about them yourself.	Use AI for simple delegated tasks you can fully review. Check everything as carefully as if you had retyped it yourself. This is where overconfidence is most dangerous. Examples: generating short code snippets function-by-function. Text revision: Ask AI to mark suggested text changes in bold, so you can decide which ones to apply manually to your final text.
Low risk (event webpage, social media post, presentations)	Use AI, but accept a non-zero chance of errors slipping through. Example: help with CSS for a workshop webpage or vibe-coding a small demo.	Delegate most of the work to AI. Verify that the output makes sense. Example: generating documentation for scripts you wrote, or drafting an outline of a presentation from a transcription of your own recordings.

(matrix from Glerean, 2026, "AI and research work" in preparation)

Note 1: The sensitivity of your data increases the level of risks. With confidential or personal data you need to pay attention not only on how you use the tool, but also which tools you use.

Note 2: A special case that spans across levels is with generated software code when the proof can be formalised through automated unit tests, if the tests are well-defined and cover the relevant behaviour of the code. This can lower the risk level, but it shifts the quality requirements on the tests themselves.

Note 3: there might be other specific use cases that do not fall within this 2x2 matrix. You are the final responsible person who can evaluate if the use of AI is appropriate.

5. Choosing AI tools: data classification

The AI tool you use should match the sensitivity of the information you are sharing with it.
Our guidelines (Recommendation 3) require using Aalto AI Assistant for anything beyond
fully public data. The table below maps Aalto's four information classification levels
to the appropriate AI tool choice, putting that requirement into practice.

Level	Examples of data	AI tool
Public	Wikipedia content, published papers with CC license, public data	Any AI tool is acceptable
Internal	Meeting notes, expense reports, internal university pages	Prefer approved institutional tools
Confidential	Participant data, unpublished findings	Use only tools with contractual data protection guarantees (like ai.aalto.fi), or run AI models locally (e.g. on Triton cluster)
Secret	Data where a breach causes serious harm (e.g. medical records)	No cloud AI tools, only local tools inside a Trusted Research Environment like SECDATA

Remember: when using non-approved tools, also consider the interaction itself as data: whatever you type into an AI system may be used for training or could be accessed by others. As a good practical example treat interactions with Meta AI, Grok (xAI), and DeepSeek as fully public, regardless of the data classification of what you share and regardless of what they promise when it comes to privacy. These tools either operate under permissive privacy policies that allow use of your inputs for model training, or are provided by organisations whose data practices cannot be independently verified.

And even with public data, disclosing unpublished research ideas to any AI service carries a non-zero risk of being scooped or of the idea being exposed to other users / public internet.

6. Disclosure of the use of AI

The Aalto Universiy guidelines on responsible use of AI in research are rooted in the ALLEA European Code of Conduct's four principles: reliability, honesty, respect, and accountability. In practice, these mean the following for your manuscript preparation:

What to declare?

As an example, here are (please check the recommendations of your publisher before submitting):

No declaration needed: fixing typos and grammar only
Declaration required: any synthesis of text, generation of code, creation of figures, suggestion of analysis approaches, or drafting of any section
AI-generated images: acceptable only for illustrating a pipeline or method; never as result figures or quantitative plots

Disclosure template

When declaring AI use, include a statement such as ():

> Title of section: Declaration of generative AI and AI-assisted technologies in the manuscript preparation process

> Statement: During the preparation of this work, the author(s) used [NAME OF TOOL / SERVICE] in order to [REASON]. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.

7. Practical tips

Here a series of practical tips and use cases with AI tools and research work, in no specific order.

Prompt engineering is dead, because LLMs are better than humans in writing prompts. So, rather than learning prompt engineering or bookmarking prompts that might not work anymore with new models, you can ask the AI to write the prompt for you and make sure you also instruct it to ask you clarifying questions before generating it.

Example 1: Here an example that has some confidential information about an ongoing research project, this could be used on Aalto AI Assistant ai.aalto.fi.

I am trying to choose which analysis method I should use with my data (transcribed interviews with company owners in Finland related to their attitude towards AI de-skilling their employees). Please make a prompt for me to be used with [model]. Before generating the prompt, ask me a few questions to refine it.

Example 2: Aalto AI assistant does not have harnesses to connect to the internet, so it cannot perform searches or check the literature. Here a prompt that uses the "deep research" feature in Google Gemini (OpenAI's ChatGPT and Anthropic's Claud have something similar) so that it can retrieve information from the internet. This prompt does not contain any confidential details:

I am a brain imaging researchers, but I am not familiar with genomics and brain imaging literature, please write a prompt to be used with Deep Research on Google Gemini so that it explores relevant literature from the past 5 years to identify important peer reviewed articles that combine genetics and brain imaging. Review articles should also be included. Before generating the prompt, ask me a few questions to refine it.

Caveat: while "deep research" seems amazing the first time you use it, many Aalto researchers who came to our past workshops have reported how it actually tends to hallucinate and report details that are not contained in the cited references. Consider deep research as a tool to discover some of the literature, but then please read the actual papers and performs usual searches with article databases.

Generative AI is very good at translating from one language to another, and one of the most useful cases is the translation from natural English to Python (or whatever programming language you work with). In the CodeRefinery lesson on "" are three ways of working with AI generated code:

Full control — manually copy-paste code function by function, review each piece, run it yourself. Lowest risk.
IDE with AI extension — AI suggests code inline; you approve line by line or block by block; you run the code. Moderate risk.
Full agent mode — you act as project manager; the agent writes and runs code autonomously. Highest risk.

Start with scenario 1. Move to scenario 2 only when you are confident you can review what is generated. For scenario 3 see the next section.

AI agents (especially coding agents) carry a higher risks because they can run independently. At least two types of risks should be mentioned.

Trust drift: the more you use an agent, the more you accept its suggestions without reviewing them. A practical solution is to force review checkpoints: sessions where you manually audit what the agent has done before proceeding.
Prompt injection: a malicious instruction embedded in an external file (a README, a dataset description) can hijack the agent's actions.

Before running agents on shared infrastructure, read the and review the risk table there. While most risks listed for HPC systems apply equally on a personal computer.

We also have some instructions on how to use coding agents on your computer. See the for examples and more in-depth coverage.

As you can guess, the general recommendation is to only use Aalto approved AI tools and Recommendation 3 of the guidelines applies here. However, often researchers need to compare various tools or need to use new features that just do not exist in the Aalto ecosystem. If you need to test a non approved tool, please follow these general recommendations:

Using only fully public data in your interactions (ideally public data that has no personal data)
Assuming the entire interaction is visible to anyone
Not disclosing unpublished hypotheses, methods, or findings

In some cases the task you need to do is specific, and does not require a large language model. For example transcription of .

Translations from one language to another can also be run locally. Our IT services and Research Software Engineers are working on those tools, in the meantime you can use Aalto AI Assistant and ask to translate the text for you.

AI can be used for generating images, but since the researcher generally does not hold copyrights over purely AI-generated images, they are not accepted for many scientific purposes, such as scientific publications. Nevertheless, this kind of tool can be helpful for aiding researchers in visual creation. Some acceptable strategies for integrating generative AI into scientific workflows include:

Conceptual mapping and aesthetics exploration

When some concept is too abstract, or when having some trouble in translating complex concepts into tangible representations, instead of starting with a blank whiteboard, researchers can use AI to generate the visual landscape of a concept. You can generate visual metaphors for data, experiment with distinct color palettes, textures, and rendering styles to establish a cohesive identity for a research project or grant proposal.
Create multiple variations of a concept

While the previous point is about exploring abstract ideas, this is about deciding how to visually execute a concept you have already defined. AI is capable of generating the exact same concept across many different art styles. You can tune it to sketch an image close to your target aesthetic, allowing you to rapidly test out different compositions, angles, and framings. Once you find the right visual approach, this AI-generated image can serve as a structural base or reference sketch to manually trace and create your own original vector artwork (e.g., using Inkscape or Illustrator).
Fast image generator for presentations

Sometimes you know exactly the image you want, but you do not have enough time to generate it. Visuals can often make the difference in capturing your audience, whether in a group meeting or a seminar presentation. Because these types of outputs are typically for internal purposes or temporary presentations, you do not necessarily need to hold the copyright. AI is great for generating background graphics, conceptual icons, or decorative elements that improve presentation slides.
Scripting for 3D modelling and rendering automation

Besides image models, AI is best for scripting. You can ask an LLM to write Python scripts for Blender (and similar) to programmatically generate precise geometries, arrange camera angles, or animate molecular structures.
Text-to-diagram syntax generation

Maintaining documentation, experimental setups, and project timelines requires clean and modifiable schematics. LLMs are very proficient at converting text descriptions into diagram code (e.g., Mermaid.js, Graphviz, PlantUML). This approach allows you to create flowcharts, diagrams, and roadmaps using code that can be edited, version-controlled via Git, and rendered natively inside markdown documents or lab notebooks.

Qualitative research has traditionally taken place primarily among the researcher, the data, and colleagues on the research team. What is fundamental to this kind of research is the researcher's interpretations and the responsibility to step back and reflect on their influence on the interpretation of the data. In the methodological literature, this is known as reflexivity, and it has long been considered central to qualitative research. In many qualitative traditions, analysis does not seek a single absolute truth; rather, it involves subjective interpretation of the material, treating it as one form of reality. This subjectivity is an integral part of the analytical process and serves as a resource for the researcher (Braun & Clarke, 2019). What counts as a defensible reading depends on the epistemological stance the researcher takes and the rationale behind the interpretive choices they make.

An AI system based on Large Language Models (LLMs) introduces an additional participant in the analytic exchange, supplying words and generating most of the text produced through a chat interface. However, AI does not function straightforwardly as a fully independent third participant, since it is not a coherent or stable entity, and the models operate as a superposition of multiple possible personas. Each prompt generates responses shaped by multiple layers, including pretraining on extensive text data, fine-tuning with example dialogues, and adjustments based on human feedback (such as RLHF), all of which contribute to the formation of the typical assistant persona. Moreover, the output is also affected by the prompt, session context, and memory. Since researchers interact with these systems in everyday language via chat, they can engage in back-and-forth exchanges that influence their thinking and decisions over time. A qualitative interpretation becomes more defensible when the researcher can account for its shaping, but as meaning becomes co-constructed through prolonged dialogue, maintaining reflexivity can be challenging. None of this makes AI unusable in qualitative work; when used reflexively, it can help you think through your own ideas, reorganize material, or test a reading against alternatives, as long as you consciously retain agency over the interpretive process. Here are some useful tips to keep in mind when doing qualitative analysis with an AI system:

Do your own interpretation first. Read the data you collect and form your own analytical reading before you involve AI, so the interpretation begins as yours.
Mind what you put into the system. Aalto’s guidance (/en/services/responsible-use-of-artificial-intelligence-in-the-research-process) is explicit that research data containing personal data should not be entered into external AI systems, so prefer Aalto-approved systems and treat the system’s output as potentially sensitive too.
Check the AI output against your data, and keep your own analysis separate. Test its suggestions against your material rather than treating them as a neutral reading. Because the system produces most of the words, you can easily slide from analyzing your data into curating its outputs, which may make the result harder to defend as your own interpretation.
Set the context the model works from. You cannot change the training, but you can set the instructions, the chat history, and what the system remembers. Use them on purpose, for example, to make the model push back on your reading of the data. These settings shape the output but do not fully control it, so use them to help you think and articulate ideas, but not to make interpretive decisions for you.
Stay reflexive in both directions by keeping a reflexive memo. Record your own shifting assumptions and position alongside the AI’s role in the analysis. Note the prompts you gave, what it suggested, and what you took up or set aside and why, so that every interpretive choice can be traced to your own reasoning.
Keep the chat logs as a record of the exchange itself. Your analytical output, in whatever form it takes, such as text, an artifact, or a document, was produced through the exchange itself, whether the system only replied in chat or acted across tools, files, and other agents you could not fully see. Reread conversations as the analysis progresses, paying attention to how the dialogue shaped your thinking.

8. Selected Q&A

Here some of the questions that have often come up in the research ethics course or in our "AI in research work" workshops.

A1: Manual checking is basically the only reliable method. A workflow from :

If a DOI or PubMed ID is present, verify it resolves to the cited article
If there is a mismatch, flag the reference
Rule out false positives (typos, abbreviation variants)
Search the title across four databases: PubMed, Crossref, OpenAlex, Google Scholar

AI detection tools (which are also based on AI...) are not reliable for this purpose.

A2: Horizon Europe has . ICML ran a with instructive findings on policy violations.

A3: Yes, the researchers who will be most resilient are those who can still reason carefully when AI is unavailable or wrong and are always able to go back to the original sources (fully read the papers, know how to check the documentation of software libraries). As every 7-year-old would tell you: Learning the multiplication tables is still worth doing, even if calculators exist.

A4: Findings in the literature are conflicting. In coding, the domain where AI performs best, productivity gains are measurable, but the cost is a higher volume of unmaintainable code and rapidly rising tool costs. In research more broadly, AI may remove some bottlenecks for individual researchers, but shift the bottleneck to peer review which remains a hard limit on the rate at which research can be validated and published. Productivity gains that bypass review quality are not gains for science.

A5: Some studies report clear impacts on the environment (), on the other hand data centers report that they use green energy and aim at zero-waist by 2030 (see ). In general, the energy costs for large AI systems are higher than running a simple web search on a search engine (that does not use AI...), so if the question you have is something that can be solved with a web search, you should consider doing that rather than asking some AI chatbot. Examples of good search engines that allow you to switch off AI results and are also privacy friendly are: ,

A6: Academic freedom is one of the core principles of research. You should never feel forced to use AI in doing your work, especially when you know that it causes more harm than good. It is however important to learn how these tools were built and how they can (or not) work for your research case. Using generative AI systems built on data scraped , or might not align with your ethical principles. Consider using AI tools which were built responsibly. Unfortunately this is easier said than done: let’s work on this together!

Research misconduct in Finland (and broadly in academia) means fabrication, falsification, and plagiarism. If not used carefully, generative AI can be the ideal misconduct machine as it can very easily engage with any of the three malpractices.

Fabrication: Generative AI systems fabricate all the time, and those fabrications are often called "hallucinations": they can be plausible-looking text, references, statistics, or data that do not exist. A 2026 study () found that over 2800 peer-reviewed papers published in the past two years contain fabricated citations. ArXiv preprint service now imposes a for manuscripts containing AI-fabricated references or other irrefutable AI slop, and publishers are adopting a similar policy . Using AI to help format or suggest citations without manually verifying each one is a significant risk. This applies even to supposedly "safe" reformatting tasks: an AI that rearranges a reference list can change author order, publication year, or invent a new title or journal name (this is more likely to happen if the list is long). Generative AI is also quite bad at summarising articles: it exhagerates or excessively paraphrases the findings, up to the level of fabrication ().

Falsification: AI tools do not reason in the same way humans do, they just predict the statistically likely next token given their training data. Using AI to produce research findings or interpret results creates a high risk of false conclusions. Quantitative and qualitative analysis must be performed with methods that the researcher can validate and re-run reliably (reproducibility). When AI is used to suggest an analysis approach, always independently re-run or verify the key steps with a known method before trusting the result.

Plagiarism: Large language models are trained on basically all available human text (and sounds and images). They synthesise text without citing sources. A recent study () showed it was possible to extract near-verbatim up to 95% of a Harry Potter book from a large language model. Even if you are not copy-pasting directly from AI generated output to avoid plagiarism, using AI to brainstorm an idea you later present as original, without checking whether that idea already appears in the literature, can constitute plagiarism.

9. Video lectures on the topic of AI and research

This section is only visible if you are logged in. The section contains videos from the Research Ethics Course for Doctoral students, by Arno Solin (Ethics in AI Research), Enrico Glerean (Generative AI and Research Integrity), and Maria Rehbinder (AI Act). We are preparing a mooc with these videos and much more. Please get in touch if you want to contribute to the MOOC.

Conclusions

In the current era of Artificial Intelligence expanding in all aspects of our lives, responsible researchers are not those who avoid AI completely, nor those who delegate everything to it. Responsible researchers understand how these systems work, what they are delegating. They can verify the output, consider the data and legal risks, disclose AI use honestly, and preserve the core human skills needed for research: reading, reasoning, documentation, communication, and accountability.

Contacts

These guidelines focus on one particular type of AI used in the research process: generative AI. The goal of these guidelines, originally provided by the European Research Area forum, is to prevent misuse and to ensure that generative AI plays a positive role as part of research practices.
The key principles for the responsible use of generative AI in research are:
• Reliability in ensuring the quality of research, reflected in the design, methodology, analysis and use of resources. This includes aspects related to verifying and reproducing the information produced by the AI for research. It also involves being aware of possible equality and non-discrimination issues in relation to bias and inaccuracies.
• Honesty in developing, carrying out, reviewing, reporting and communicating on research transparently, fairly, thoroughly and impartially. This principle includes disclosing that generative AI has been used.
• Respect for colleagues, research participants, research subjects, society, ecosystems, cultural heritage and the environment. Responsible use of generative AI should consider the limitations of the technology, its environmental impact16 and its societal effects (bias, diversity, non-discrimination, fairness and prevention of harm). This includes the proper management of information, respect for privacy, confidentiality and intellectual property rights, and proper citation.
• Accountability for the research from idea to publication, for its management and organisation, for training, supervision and mentoring, and for its wider societal impacts. This includes responsibility for all output that a researcher produces, underpinned by the notion of human agency and oversight.

Services

Cute white AI robot diagram showing data inputs, processing icons and green output symbols for tasks and tools

Coding assistants are AI-based tools that help you to write programming code, review it and make development processes faster.

Services

Digital illustration of clouds, arrows, and a microphone representing data flow and cloud computing.

Speech-to-Text is a multilingual, AI-powered transcription service.

Services

A safer way for Aalto researchers, teachers and students to build tools with large language models within Aalto’s infrastructure.

4.6.2026 News

Tips how to use AI in teaching

Services

Black circles and squares connected by dotted lines on a blue background.

AI systems, AI models and copyright questions related to them.

Services

Updated: 26.6.2026
Published: 26.6.2026

��

Research & Art